Today’s data-driven enterprises know that expectations for data are high, but time, resources and the patience to deal with the massive influx of data are low. When it comes to data warehouses, the question seems to be not “if”, but “when” an enterprise will move to a cloud data warehouse.
Warehousing in the Data-driven World
The Data Warehouse Institute (TDWI) released a Data Warehouse Modernization report indicating that 48% of the enterprises surveyed are currently, or have immediate plans to, rip and replace their current data warehouse solution. For these enterprises, “modernization” will likely be a move to a cloud storage solution – away from the original data warehouses.
As if the tsunami of data wasn’t enough to manage, new regulatory measures regarding data privacy are emerging. Complicating things further are the new and varied data types that are emerging: unstructured data, log files, and IoT data, streaming in from a variety data sources, from on-premise, the cloud, third party, or SaaS apps. Finding the most efficient and secure solution to gather data from all those sources so it can provide business intelligence (BI) can seem challenging.
The Original Data Warehouses
For many years, on-premises data warehouses were the only storage method available. The concept of the data warehouse has existed since the 1980s, when the transformative power of data was beginning to be harnessed. The first data warehouses were developed to transition data from merely powering operations to fueling decision support systems that leverage big data for analytics. The end goal, then and now, is to have all of this data processed and stored in the data warehouse so it’s easy for decision makers to access it to use for analytics and BI.
Why Your Next Data Warehouse Should Be in the Cloud now.
The Cost of On-premise Data Warehouses
An on-premise data warehouse comes at many costs, the first and most quantifiable, is financial. The up-front costs of hardware and ongoing maintenance costs of maintaining a conventional data warehouse can add up. Today, it can cost an enterprise $100,000 or more to add a terabyte of data to a data warehouse – and that’s not including the necessary hardware and additional time needed for support. It’s a lot of money for a legacy solution that can’t scale to meet data needs.
Alternatively, with a cloud data warehouse, there are no physical servers to buy or set up. Capacity isn’t an issue, so data can flow seamlessly at peak and slow times, which brings us to the major benefit of cloud: the low cost of data storage. With cloud data warehouses, enterprises only pay for the storage needed at the time, which can fluctuate wildly.
Two Types of Data Warehouse Modernization
Data warehouse modernization is a top concern for many companies. Our customers at Talend usually approach this modernization in one of two ways:
1. “Lift and shift”: In this scenario, there is an existing stack that is working just fine, but it needs modernized – in this case, the on-premise applications will move to the cloud. The enterprise will need to research cloud data warehouse options and then determine if their current stack will integrate well with any of those options.
The “lift and shift” is a gradual process: first, a small subset of applications are migrated to the new data warehouse solution. That group is tested and validated, and once it is found to be functioning well in the new warehouse, the process is iterated, with a wider scope of applications and users brought over each time. Once all the current stack has migrated, all new jobs will be designed to feed the data into the new data warehouse.
Negligible risk is assumed with the “lift and shift” method of data warehouse modernization. No huge upfront investments are made, and that’s the beauty of cloud subscription software and services. You can walk away from a project that didn’t work and try something different, or swap in and out the various pieces that are needed to make it a success.
2. Completely new projects: The second approach to data warehouse modernization is just like it sounds: a new project comes up, and it is decided it is not going to live in a legacy data warehouse – or the data in an old warehouse has issues, you just walk away and start fresh. Perhaps the data model was no longer valid, or there was a fundamental issue with the data. Another example is if a sales and marketing group wants data warehouse that is entirely separate from the corporate, so they can have it configured specifically for their group’s reporting and analytics needs. In all these cases, a data warehouse in the cloud from the ground up.
Popular Cloud Data Warehouse Options
For those enterprises seeking a fresh start, there are a variety of cloud data warehouses available. Solutions like Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure SQL Data Warehouse are popular options for storing and analyzing cloud data. Here’s a look at each:
- Snowflake is growing in popularity. Founded in 2012, this startup’s cloud data warehouse solution can adjust to changing data volumes easily and its patented architecture separates compute from storage so you can scale up and down. With Snowflake, you stop paying when the compute isn't happening, and data moves to higher latency storage that's less expensive.
- Amazon Redshift has also been around since 2012 and gained instant notoriety as a new member of the Amazon Web Services (AWS) cloud computing umbrella. Redshift is known for its Massively Parallel Processing (MPP) architecture that makes data load extremely fast. Because it is in the AWS ecosystem, enterprises moving data from Amazon EC2 or S3 will likely find Redshift to be the fastest and cheapest cloud data warehouse option.
- Google BigQuery came on the scene in late 2011 and runs on top of another member of the Google family, Dremel, an ad hoc query system. Google’s contribution to the cloud data warehouse market is serverless and known for its powerful analysis capabilities.
- Microsoft Azure SQL Data Warehouse is a fully-managed, petabyte scale cloud data warehouse. Like Amazon Redshift, it is built on the massively parallel processing (MPP) architecture. The fastest and most optimal way to load data into SQL Data Warehouse is to use PolyBase to load data from Azure Blob storage.
7 Tips for Modern Data Warehousing now.
The Cloud Data Warehouse Choice is Yours
The importance and scale of big data has grown exponentially. To successfully harness the power of this data, the scalable, secure, and faster cloud is becoming the preferred data warehouse solution. Talend offers more than 900 connectors and components that make it easy to transform, profile, and cleanse all your on-premises and cloud data, and then load It onto the data warehouse of your choice.