Accelerate the Move to Cloud Analytics with Talend, Snowflake and Cognizant

Accelerate the Move to Cloud Analytics with Talend, Snowflake and Cognizant

  • Sudharsanan Narasimharaghavan
    Sudharsanan Alisoor is a TOGAF certified Senior ETL Architect with more than 15+ years of experience in System Development Life Cycle - Analysis, Design, Development, Testing, Evaluation and Implementation with extensive exposure in Data Architecture, Database Applications, Data warehouse and ETL process technology. Mr Sudharsanan has proven track record in managing complex multi-organization, global programs and large projects. strong international experience -managed various customer programs for North America & Asia pacific regions.

In the last few years, we’ve seen the concept of the “Cloud Data Lake” has gained more traction in the enterprise. When done right, a data lake can provide the agility for Digital Transformation around customer experience by enabling access to historical and real-time data for analytics.

However, while the data lake is now a widely accepted concept both on-premises and in the cloud, organizations still have trouble making them usable and filling them with clean, reliable data. In fact, Gartner has predicted that through 2018, 90% of deployed data lakes will be useless.  This is largely due to the diverse and complex combinations of data sources and data models that are popping up more than ever before.            

Migrating enterprise analytics on-premises to the cloud requires significant effort before delivering value. Cognizant just accelerated your time to value with a new Data Lake Quickstart solution. In this blog, I want to show you how you can run analytics migration projects to the cloud significantly faster, deliver in weeks instead of months, with lower risk using this new Quickstart.

Cognizant Data Lake Quickstart with Talend on Snowflake

First, let’s start by going into detail on what this Quickstart solution is comprised of. The Cognizant Data Lake Quickstart Solution includes:

  • A data lake reference architecture based on:
    • Snowflake, the data warehouse built for the cloud
    • Talend Cloud platform
    • Amazon S3 and Amazon RDS
  • Data migration from on-premises data warehouses (Teradata/Exadata/Netezza) to Snowflake using metadata migration
  • Pre-built jobs for data ingestion and processing (pushdown to Snowflake and EMR)

How It Works

  • Uses Talend to extract data files from on-premises (structured/semi-structured) and ingest into Amazon S3 using a metadata-based approach to store data quality rules and target layout
  • Stores data on Amazon S3 as an enterprise data lake for processing
  • Leverages the Talend Snowflake data loader to move files to Snowflake from Amazon S3
  • Runs Talend jobs on execution connecting to Snowflake and process data

Data Migration from On-premises Data Warehouse (Teradata/Exadata/Netezza) to Snowflake

For data migration projects, the metadata-based migration framework leverages Talend and Snowflake. Both source and target (Snowflake) metadata (Schema, tables, columns and datatype) are captured in the metadata repository using a Talend ETL process. The data migration is executed using Talend and Snowflake Copy utility.

Pre-built Jobs for Data ingestion and Processing

For incremental data loads, Cognizant has included pre-built Talend jobs that support data loads from source systems into the Amazon S3 layer, further into Snowflake Staging. These jobs then transform and load the data into Snowflake Presentation layer tables using Snowflake compatible SQL. Another option is to have pre-built jobs use the Amazon S3 layer to build a conformed layer in S3 using AWS EMR and Talend Spark components then later load the conformed data directly into Snowflake Presentation layer tables.


So, what are the benefits of this Quickstart architecture? Let's review:

  • Cost optimization – Up to 50% reduction in initial setup effort to migrate to Snowflake
  • Simplification – Template based approach to facilitate Infrastructure setup and Talend jobs
  • Faster time to market – Deliver in weeks instead of months.
  • Agility: Any changes to migration mainly consist of changes only to metadata without any code change. Self-service mechanism to onboard new sources, configurations, environments, etc. just by providing metadata with minimal Talend technical expertise. It's also easy to maintain as all data migration configurations are maintained in a single metadata repository.

Now go out and get your cloud data lake up and running quickly. Comment below and let me know what you think!

Join The Conversation


Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Kevin says:

    I assume this architecture would work just as well with Azure Blob or Azure Data Lake Store Gen2?