Research indicates approximately 50 percent of business data resides in the cloud, illustrating the importance of external data sources to the modern enterprise. Organizations need similarly modern tools to swiftly process and integrate this data in a range of time commensurate with the current speed of business. The best Extract, Load, and Transform (ELT) tools accommodate these workloads and are gaining credence in the data warehouse space as a cost effective, efficient, high performance means of data integration—whether external or internal.
Many organizations are increasingly turning to ELT tools to address the volume, variety, and velocity of big data sources, which often strain conventional Extract, Transform and Load (ETL) tools designed for internal, relational data warehousing. In this article we’ll clarify the differences between ETL and ELT tools, discuss how ELT tools enhance data warehouses, and indicate how ELT tools are impacting the future of data integration.
Cloud Data Warehouse Trends for 2019 now.
ELT vs ETL: What’s the difference?
ELT is the process by which raw data is extracted from origin sources (Twitter feeds, ERP, CRM, etc.) and loaded into target sources, usually data warehouses or data lakes. Unlike other approaches, ELT involves transforming data within target systems, resulting in reduced physical infrastructure and intermediate layers.
It’s helpful to think of ELT tools as an evolution of traditional ETL methods. ETL tools are separate platforms architected between origin systems and target systems. The key difference between ETL and ELT tools is ETL transforms data prior to loading data into target systems, while the latter transforms data within those systems. This distinction is crucial for many downstream processes and affects the following systems.
Infrastructure and Resources
ETL tools are dedicated platforms for the intermediate steps between extracting data and loading it into target repositories. Organizations are tasked with purchasing and maintaining these tools in order to integrate data into target systems. Since ELT tools don’t require this intermediate step to load data into target systems, they requires less physical infrastructure and dedicated resources because transformation is performed with the target system’s engine — as opposed to the engines within ETL tools.
ETL tools are responsible for the data staging process in which data is cleansed and prepared for transformation. With ELT, data staging occurs after data is loaded into data warehouses, data lakes, or cloud data storage, resulting in increased efficiency and less latency. Consequently, the best ELT tools require fewer demands of initial data sources and have no need for ETL’s intermediate steps because the majority of the data processing occurs in the target system.
The performance of ELT tools is vastly superior to that of ETL tools, especially when working with data at scale. Huge petabytes of data can easily create a bottleneck with ETL tools, since these mechanisms rely on their own servers and engines to transform data. Moreover, that transformation complexity is increased with the assortment of semi-structured and unstructured data routinely populating big data sources. ETL bottlenecks can considerably prolong the latency of accessing and analyzing data in data warehouses.
Time to Value
With ELT tools, the time to value for actually analyzing and acting on data is accelerated by transforming data within target systems. Data scientists and sophisticated business analysts can leverage schema on read options with minimal manual coding to quickly transform data and use machine learning techniques for analysis. ETL tools are slowed down by manual coding processes required to make all data conform to the uniform schema of a data warehouse, for example, prior to analysis.
The shift from ETL to ELT tools is a natural consequence of the big data age. Traditional ETL tools were created for conventional, relational data warehousing, in which most data came from internal systems and was predominantly structured. The dedicated computational resources for ETL tools simply weren’t made for the scale, variation, and low latency needs of big data workloads. Although these tools may still be viable for structured, internal data, they’re quickly becoming outdated for integrating the array of unstructured and semi-structured big data from external sources—especially for low latency applications like the Internet of Things.
How to Future Proof Your Integration Strategy now.
ELT Improves Data Warehousing
There are several ways in which the best ELT tools are useful for improving data warehouses and data lakes. In both cases, ELT tools can expedite the time necessary to prepare data for analysis. By loading data into a data lake framework such as Hadoop, organizations are able to use processing engines within it for staging and transforming data. The Hadoop framework was created for immense scalability and leverages parallel processing to hasten computational jobs. Thus, when simply using ELT to load a data lake, organizations can use this method to derive schema on read without all of the conventional data modeling work needed to unify schema in relation settings.
Loading data warehouses with ELT relies on much of this same methodology. During the transformation process, however, data is transformed into the unified schema of these repositories. There’s also an extra step in which the transformed data is then loaded from a data lake such as Hadoop into the actual warehouse itself. Many of the temporal advantages still apply, however, as do the architectural and infrastructural benefits of leveraging the processing engine of Hadoop for transformation. The benefits of ELT include:
- Streamlined Architecture: By leveraging the processing power of target systems like Hadoop, ELT tools streamline the architecture necessary to prepare data for consumption. There is no intermediate layer with processing power limitations; the target system is used for both data staging and transformation.
- Rapid Incorporation of Big Data Sources: There’s also a wealth of sources involving semi-structured and unstructured big data that are readily incorporated into data warehouses and data lakes with ELT. Those sources are difficult to quickly use with traditional ingestion and transformation methods.
- Data Sandboxes: The recurring benefits of the best ELT tools include the use of data stores such as Hadoop as sandboxes for data scientists to experiment without having to standardize schema according to that of the underlying repository—which is necessary with conventional approaches.
- Storage and Processing: ELT tools enable organizations to use target systems for both storage and processing power. Doing so helps maximize the ROI of these repositories, which assists in justifying these tools to upper level management.
Business Intelligence is still the quintessential use case for data warehousing. The best ELT tools considerably enhance BI in several ways. They enable the rapid incorporation of numerous external sources alongside traditional internal ones, such as supplementing CRM or ERP data with alternative information like social media data. ELT methods enables each of these data sources to load into Hadoop for transformation, empowering data scientists with schema on read to understand how alternative data relates to business needs and warehousing schema. Once those sources are transformed to meet the warehousing schema, users can issue reports on a wider range of data for more meaningful analysis of customer trends.
Best Practices Report: Multiplatform Data Architectures now.
ELT Paves the Future for Data Integration
Overall, ELT is a compelling paradigm for accommodating the size, speed, and sundry of big data routinely used throughout the enterprise today. It forsakes the traditional intermediary layer of ELT to push data staging and transformation into underlying data repositories, using their modern processing powers for transformation. This approach simplifies integration architecture, accelerates time to value, and offers robust performance necessary to continually mine big data for all it’s worth—particularly when compared to traditional ETL methods.
The contemporary emphasis on big data and the increasingly heterogeneous computing environments frequently required will ensure that timely, sustainable, and effective data integrations remain a top organizational priority for some time. Ultimately, ELT solves this issue by enabling a significant degree of flexibility for how data integrations are implemented.
Exemplifying the breadth of ELT advantages, Talend Open Studio is used with many of the more popular big data frameworks today. Explore how ELT can take your enterprise to the next level by downloading Talend Open Studio.