A Simple Architecture for Building a Big Data Lake on Azure with Talend Cloud

Big data has emerged to be the most important tool businesses use to help shape their future. Major companies like Amazon, Uber and Netflix are using big data to fuel breakneck speed of innovation in everything from customer engagement to new product development to business optimization strategy. And the rise of big data technologies such as Hadoop, Spark, Kubernetes, and Kafka, combined with the promise of cloud, have empowered countless enterprises to execute their big data initiatives effortlessly. By moving towards the cloud, companies are already reaping the benefits such as speed of provisioning, time to market, flexibility and agility, instant scalability, reduced overall IT and business costs, to name a few.

Getting started with Azure and Talend Cloud

Among the leading cloud platforms, one of the most widely adopted is Microsoft Azure, a secure, flexible, enterprise-grade cloud platform that offers IaaS, PaaS, SaaS, and many other development tools and frameworks that can help create a data lake to deliver enterprise big data analytics.

Meanwhile, Talend Cloud is an open, highly scalable cloud integration (iPaaS) solution that simplifies your data and app integrations. Talend cloud brings:

  • Broad connectivity where you can connect to any on-premises databases, SaaS apps, cloud apps, Azure Blob Storage, Azure Data Lake Store, Azure HDInsight, Azure SQL Data Warehouse, Azure CosmosDB, and more
  • Native Spark and Hadoop support
  • Built-in data quality
  • Self-service capabilities such as data prep, data stewardship, and data governance
  • Enterprise capabilities like SDLC and multi-cloud support

Creating a Big Data Lake on Azure for accurate and reliable data

Talend and Azure have been working together to provide our joint customers hyper-scale cloud data lake solution that can deliver actionable insights. But first, what is a data lake? A data lake is an architecture that allows organizations to store massive amounts of data into a central repository. Typically, this includes data of various types and from multiple sources, readily available to be categorized, processed, analyzed and consumed by diverse groups within the organization. Data lakes help eliminate data silos and capture 360-degree views of the organization, customer, and partner data. Compared to traditional data storage and analytics, data lakes help deliver more agility and flexibility especially when built in a cloud environment. A data lake architecture is not limited by response time when in need of rapid changes such as adopting new IT solutions, connecting to new data types and sources, and performing new types of analytics.

The following diagram shows how a typical customer implements a data lake solution using Azure and Talend Cloud:

In this simplified use case, you ingest your structured or unstructured data from the web, social, machine sensors, devices, or on-premises applications into Azure Data Store (ADL Store), a hyper-scale Hadoop file system for big data analytics workloads. It’s compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem.

Talend Cloud then helps profile your data stored on ADL Store, adding requirements for data governance, business rules, and regulations and compliance. Then you use Talend’s built-in Data Quality natively in Azure HDInsight to prepare data for analysis. Finally, you move the transformed and cleansed data to Azure SQL Data Warehouse, from there, business analysist can directly access those data for BI reports.

Using Talend, many companies accelerate their ingestion time by 50% into their Microsoft Azure Data Lake. Watch this video below to learn about how Talend Cloud is helping customers move to the cloud or start experiencing Talend Cloud first hand by signing up for a free trial today.

Join The Conversation


Leave a Reply