Every year, we see the concept of the “data lake” gain more traction in the enterprise. Organizations seeking to leverage the power of data for business transformation increasingly look to cloud data lakes to collect and analyze structured, semi-structured and unstructured data to transform into business insights.
However, while the data lake is now a widely accepted concept both on-prem and in the cloud, organizations still have trouble making them usable and filling them with clean, reliable data. In fact, Gartner has predicted that through 2018, 90% of deployed data lakes will be useless. This is largely due to the diverse, often large and more complex combinations of data sources and data models that are popping up more than ever before.
Given this market situation, we’ve teamed with Cognizant and Amazon Web Services to build a complete Quick Start solution for deploying data lakes on AWS. Essentially, this is a complete data architecture to help organizations get data lakes up and running quickly in the Cloud. This for users who are evaluating big data in the cloud or looking to accelerate their big data initiative through the adoption of best practices for big data integration.
I want to take a few moments to go through some of the key features of this new solution.
Data Lake quick start features:
The new Quick Start is really all about simplicity for customers. We want everyone and anyone to be successful deploying data lakes in the cloud. With that thought in mind, we’ve included features that will help you get up and running on AWS in minutes, then evaluate in days rather than weeks.
The Quick Start provides the following features:
- Enables self-service by provisioning required services and components to build a data lake.
- Provides flexibility to spin up environments for development, test, and production.
- Includes an optional sample dataset and prebuilt Talend Spark jobs that help you explore the architecture and understand the stages of the end-to-end dataflow.
- Offers the Cognizant ingestion framework, big data validation, and DevOps platform to ingest, validate, and deploy big data solutions.
Data Integration Architecture in Quick Start
In this reference architecture, each dataflow is designed in Talend Open Studio and orchestrated by the Talend Big Data Platform. Talend Open Studio helps you create job templates using an easy to understand visual interface. It also provides metadata management capabilities.
The Talend Big Data Platform then runs these jobs to take the data through the flow (see figure 2). You can use the sample, prebuilt jobs included with the Quick Start to test the results of the system. The new Quick Start also features a number of these prebuilt jobs to demonstrate the flow and use of the system.
With the comprehensive, out-of-the-box solution, organizations can start delivering breakthrough insights in just a few weeks or months using powerful AWS analytics services such as Amazon QuickSight and Amazon Machine Learning (Amazon ML). The features above are just a few reasons to check out our new Quick Start data lake solution with Cognizant and AWS.