Accelerate the Move to Cloud Analytics with Talend, Snowflake and Cognizant
In the last few years, we’ve seen the concept of the “Cloud Data Lake” has gained more traction in the enterprise. When done right, a data lake can provide the agility for Digital Transformation around customer experience by enabling access to historical and real-time data for analytics.
However, while the data lake is now a widely accepted concept both on-premises and in the cloud, organizations still have trouble making them usable and filling them with clean, reliable data. In fact, Gartner has predicted that through 2018, 90% of deployed data lakes will be useless. This is largely due to the diverse and complex combinations of data sources and data models that are popping up more than ever before.
Migrating enterprise analytics on-premises to the cloud requires significant effort before delivering value. Cognizant just accelerated your time to value with a new Data Lake Quickstart solution. In this blog, I want to show you how you can run analytics migration projects to the cloud significantly faster, deliver in weeks instead of months, with lower risk using this new Quickstart.
Cognizant Data Lake Quickstart with Talend on Snowflake
First, let’s start by going into detail on what this Quickstart solution is comprised of. The Cognizant Data Lake Quickstart Solution includes:
- A data lake reference architecture based on:
- Snowflake, the data warehouse built for the cloud
- Talend Cloud platform
- Amazon S3 and Amazon RDS
- Data migration from on-premises data warehouses (Teradata/Exadata/Netezza) to Snowflake using metadata migration
- Pre-built jobs for data ingestion and processing (pushdown to Snowflake and EMR)
How It Works
- Uses Talend to extract data files from on-premises (structured/semi-structured) and ingest into Amazon S3 using a metadata-based approach to store data quality rules and target layout
- Stores data on Amazon S3 as an enterprise data lake for processing
- Leverages the Talend Snowflake data loader to move files to Snowflake from Amazon S3
- Runs Talend jobs on execution connecting to Snowflake and process data
Data Migration from On-premises Data Warehouse (Teradata/Exadata/Netezza) to Snowflake
For data migration projects, the metadata-based migration framework leverages Talend and Snowflake. Both source and target (Snowflake) metadata (Schema, tables, columns and datatype) are captured in the metadata repository using a Talend ETL process. The data migration is executed using Talend and Snowflake Copy utility.
Pre-built Jobs for Data ingestion and Processing
For incremental data loads, Cognizant has included pre-built Talend jobs that support data loads from source systems into the Amazon S3 layer, further into Snowflake Staging. These jobs then transform and load the data into Snowflake Presentation layer tables using Snowflake compatible SQL. Another option is to have pre-built jobs use the Amazon S3 layer to build a conformed layer in S3 using AWS EMR and Talend Spark components then later load the conformed data directly into Snowflake Presentation layer tables.
So, what are the benefits of this Quickstart architecture? Let's review:
- Cost optimization – Up to 50% reduction in initial setup effort to migrate to Snowflake
- Simplification – Template based approach to facilitate Infrastructure setup and Talend jobs
- Faster time to market – Deliver in weeks instead of months.
- Agility: Any changes to migration mainly consist of changes only to metadata without any code change. Self-service mechanism to onboard new sources, configurations, environments, etc. just by providing metadata with minimal Talend technical expertise. It's also easy to maintain as all data migration configurations are maintained in a single metadata repository.
Now go out and get your cloud data lake up and running quickly. Comment below and let me know what you think!