ETL vs ELT: Defining the Difference
The difference between ETL and ELT lies in where data is transformed and how much data is retained in working data warehouses.
Extract/transform/load (ETL) is an integration approach that pulls information from remote sources, transforms it into defined formats and styles, then loads it into databases, data sources, or data warehouses.
Extract/load/transform (ELT) similarly extracts data from one or multiple remote sources, but then loads it into the target data warehouse without any other formatting. The transformation of data, in an ELT process, happens within the target database. ELT asks less of remote sources, requiring only their raw and unprepared data.
Both approaches are viable, but when designing a data architecture, IT decision makers must consider internal capabilities and the growing impact of cloud technologies.
The evolution of ELT
ELT has been around for a while, but gained renewed interest with tools like Apache Hadoop, a framework for distributing and processing large workloads across a few — or many thousand — work nodes for parallel processing. A large task like transforming petabytes of raw data was divvied up into small jobs, remotely processed, and returned for loading to the database.
But evolutions in processing power, especially virtual clustering, have given local server resources exponentially increased power, reducing the need to split jobs. Big data tasks that used to be distributed around the cloud, processed, and returned can now be handled in one place.
How ELT works and when to use it
Unlike ETL, Extract/Load/Transform is the process of gathering information from an unlimited amount of sources, loading them into a processing location, and transforming them into actionable business intelligence.
- Extract — The first step, extract, works similarly in both data management approaches. Raw streams of data from virtual infrastructure, software, and applications are ingested either entirely or according to predefined rules.
- Load — Here is where ELT branches off from its ETL cousin. Rather than deliver this mass of raw data and load it to an interim processing server for transformation, ELT delivers it whole to the site where it will eventually live. This shortens the cycle between extraction and delivery, but means there is a lot more work to be done before the data becomes useful.
- Transform — The database or data warehouse sorts and normalizes the data, keeping part or all of it on hand and accessible for customized reporting. The overhead for storing this much data is higher, but it comes with more opportunities to custom-mine it for relevant business intelligence in almost real-time.
So is ELT the right choice? Depending on a company’s existing network architecture, budget, and the degree to which it is already harnessing cloud and big data technologies, not always. But when any or all of the following three focus areas are critical, the answer is probably yes.
- When ingestion speed is king. Because ELT doesn’t have to wait for the data to be worked off-site and then loaded, (data loading and transformation can happen in parallel) the ingestion process is much faster, delivering raw information considerably faster than ETL.
- When more intel is better intel. The advantage of turning data into business intelligence lay in the ability to surface hidden patterns into actionable information. By keeping all historical data on hand, organizations can mine along timelines, sales patterns, seasonal trends, or any emerging metric that becomes important to the organization. Since the data was not transformed before being loaded, you have access to all the raw data. Typically, cloud data lakes have a raw data store, then a refined (or transformed) data store. Data scientists, for example, prefer to access the raw data, whereas business users would like the normalized data for business intelligence.>
- When you know you will need to scale. When you are using high-end data processing engines like Hadoop, or cloud data warehouses, ELT can take advantage of the native processing power for higher scalability.
Both ETL and ELT are time-honored methodologies for producing business intelligence from raw data. But, as with almost all things technology, the cloud is changing how businesses tackle ELT challenges.
Geo-ETL: Processing Geodata with Talend now.
Benefits of using ELT in the cloud
The cloud brings with it an array of capabilities that many industry professionals believe will ultimately make the on-premise data center a thing of the past. The cloud overcomes natural obstacles to ELT by providing:
- Scalability — ELT functions in older, on-site data centers could quickly overwhelm local processing and storage power, requiring expensive hardware upgrades and scheduled downtime while fixes are deployed.
The scalability of a virtual, cloud infrastructure and hosted services — like integration platform-as-a-service (iPaaS) and software-as-a-service (SaaS) — give organizations the ability to expand resources on the fly. They add the compute time and storage space necessary for even massive data transformation tasks.
- (Almost) seamless integration — Because cloud-based ELT interacts directly with other services and devices across a cloud platform, previously complex tasks like ongoing data mapping are dramatically simplified. What were once monumental challenges can be rendered as simple, interactive graphical interfaces that provide all the critical information at a glance.
- Open source — The best ELT solutions harness the power of living, open-source cloud platforms, which work collaboratively to push improvements, security, and compliance across the enterprise. Open source ELT results in global, professional communities eliminating data challenges as, or even before, they arise in your network.
- Lower cost of ownership — When ETL was the standard, expanded capability meant expanded costs. Beefing up processing and storage resources was a one-way expense, requiring big investments in hardware. The limitation of this approach was that organizations had to pay upfront for the most power they would ever require, but day-to-day operations used only a fraction of that capability.
Like most cloud services, cloud-based ELT is pay-as-you-use. This means that compute and storage costs will run higher when huge ETL jobs are processing, but drop to near zero when the environment is operating under minimal pressure. Averaged annually, this results in far lower total cost of ownership — especially when coupled with no upfront investment.
In these and many other ways the cloud is redefining when and how companies are localizing business intelligence productions.
The Rise of Cloud Integration Platforms now.
Overcoming common ELT challenges
To get a job done right, every organization relies on the right tools and expertise. As with any task, mistakes early on in the production process are amplified as the project grows, and there are a few common pitfalls that can undermine any ELT architecture.
- Security gaps — Moving petabytes of data, and making it accessible to all applications and users, comes with security risks. A safe design approach integrates security at all levels of the enterprise, ensuring a corrupted or compromised data set can’t infect data warehouses.
- Compliance complacency — Growing compliance frameworks like HIPAA, PCI, and the GDPR create increased pressure on organizations to perform mandatory audits and prove standards are met. Any ELT approach must be designed with compliance in mind to prevent running afoul of national and international regulations.
- Resource bloat — The advantages of having warehouses of data to mine for business intelligence come with one obvious drawback: all of that data must be maintained. Cloud providers and pay-as-you-use pricing make harnessing big data more affordable than ever, but even tiered storage pricing can get expensive without a plan for managing and prohibiting the endless growth of working information sets.
- Absent data governance — Just as the security of the data moving through an ELT process is critical, so are the five Ws of data governance:
Who controls master data management in the organization?
What data is gathered/kept?
When are overviews and audits performed?
Where is data stored?
Why are ELT efforts positively impacting business performance?
Answering key questions in advance creates responsible ELT practices and sets businesses up for rich harvests of information that daily impacts the bottom line.
Wrapping up ELT v. ETL
For more than fifteen years, Talend has been providing global partners with the tools they need to transform their businesses. Take charge of even the biggest ELT jobs with Open Studio for Big Data, the free, globally supported platform trusted by some of the world’s biggest corporations.
Get started with ELT or ETL with Talend’s tools for Big Data. Learn more about what Talend can do for organizations and download the Big Data Sandbox to set up a development and testing environment today.