Birds migrate. But why do data warehouses?

Well, let’s be specific here. Birds migrate either north or south. Data warehouses are only going in one direction. Up, to the cloud.

It’s a common trend we’re seeing across every vertical and across every region. Companies are moving their existing data warehouses to cloud environments like Amazon Redshift. And more often than not –unlike their feather counterparts– ­once they migrate to the cloud, they never come back.

But why?

Simply put, it just makes sense.

The cloud has fundamentally changed the game of leveraging data effectively. There are far fewer constraints on resources, procurement, scale, set up, or speed. In other words, cloud computing has made most on-premises datacenter strategies, obsolete. So why wouldn’t you invest in something like that?

Now that I’ve got your attention, let’s get into the nitty gritty details.

Cloud data warehouses are awesome, but they’re not perfectly wrapped packages. How will you get the data you have from all your cloud and on-premises applications and systems into them? How will you make sure the data is right? You still need to make the projects that are driven off of them successful. You still need to make them work with everything else. This is not trivial. In fact, it’s really really hard.

That’s where we come in.

In a typical scenario, companies will utilize a data lake like Amazon S3 for structured and unstructured data. The biggest challenge at this stage is just getting the data in there. Talend speeds up the ingestion with over 1000 connectors and components. It doesn’t matter if the data is from Teradata, Salesforce, DB2, or any other system. You don’t need to worry about version numbers, data formats, or even where the system resides. We make it effortless to feed the data lake.

Next, there’s a fundamental need to standardize, parse, and cleanse the data. This is where the G-word comes in. That’s right: Governance. In the end, governed data is useful data. That’s what we need to drive projects and make them successful. At this stage we’re paring down the data and throwing out the bits we don’t need. To do this effectively, we can leverage machine learning and big data processing coupled with insight from people who understand the business. The machine learning can help identify data that needs to be fixed and automatically correct it based on prior corrections. We have tools like Data Preparation and Data Stewardship that enable different users (including business analysts) to contribute to refining the data. Finally, we generate native code to run on big data services like EMR to help consolidate large data sets so that only the important concise data remain.

All this clean and governed data can now be moved into a cloud data warehouse like Redshift for optimized performance in a structured data environment. At this point, the cloud data warehouse serves as the quick and clean repository for all uses of this trusted data. Analytics is a huge use case for cloud data warehouses.

In fact, you can see a terrific example of this in this video featuring the University of Pennsylvania where they modernized a legacy application with a hybrid AWS implementation and reduced runtime by 3x.


To learn more about how a modern data warehouse on AWS can drive business results, check out the Cloud Architects’ Handbook on How Leading Enterprises Achieve Business Transformation with Talend and AWS. I hope this helps you on your migratory journey to cloud data warehousing.

Have questions? Feel free to reach out to us and we’ll help you out. 

By the way, if you happen to be in Las Vegas in early December for AWS RE:Invent 2019, please drop by our booth #613! We’d love to show you all of this in person!


Join The Conversation


Leave a Reply