ETL and ELT have a lot in common. At their core, each integration method makes it possible to move data from a source to a data warehouse. The difference between the two lies in where the data is transformed, and how much of data is retained in the working data warehouse. In this article, we’ll consider both ETL and ELT in more detail, to help you decide which data integration method is right for your business.
The Definitive Guide to Data Integration now.
ETL – Extract, transform, load
Extract/transform/load (ETL) is an integration approach that pulls information from remote sources, transforms it into defined formats and styles, then loads it into databases, data sources, or data warehouses.
See how Talend helped Domino’s Pizza ETL data from 85,000 sources.
ELT – Extract, load, transform
Extract/load/transform (ELT) similarly extracts data from one or multiple remote sources, but then loads it into the target data warehouse without any other formatting. The transformation of data, in an ELT process, happens within the target database. ELT asks less of remote sources, requiring only their raw and unprepared data.
The evolution of ELT
ELT has been around for a while, but gained renewed interest with tools like Apache Hadoop. A large task like transforming petabytes of raw data was divvied up into small jobs, remotely processed, and returned for loading to the database.
Improvements in processing power, especially virtual clustering, have reduced the need to split jobs. Big data tasks that used to be distributed around the cloud, processed, and returned can now be handled in one place.
Each method has its advantages. When planning data architecture, IT decision makers must consider internal capabilities and the growing impact of cloud technologies when choosing ETL or ELT.
How ELT works (and when to use it)
Unlike ETL, Extract/Load/Transform is the process of gathering information from an unlimited number of sources, loading them into a processing location, and transforming them into actionable business intelligence.
- Extract — The first step, extraction, works similarly in both data management approaches. Raw streams of data from virtual infrastructure, software, and applications are ingested either entirely or according to predefined rules.
- Load — Here is where ELT branches off from its ETL cousin. Rather than deliver this mass of raw data and load it to an interim processing server for transformation, ELT delivers it whole to the site where it will eventually live. This shortens the cycle between extraction and delivery, but means there is a lot more work to be done before the data becomes useful.
- Transform — The database or data warehouse sorts and normalizes the data, keeping part or all of it on hand and accessible for customized reporting. The overhead for storing this much data is higher, but it comes with more opportunities to custom-mine it for relevant business intelligence in almost real-time.
When is ELT the right choice?
Depending on a company’s existing network architecture, budget, and the degree to which it is already harnessing cloud and big data technologies, not always. But when any or all of the following three focus areas are critical, the answer is probably yes.
When ingestion speed is the number one priority.
Because ELT doesn’t have to wait for the data to be worked off-site and then loaded, (data loading and transformation can happen in parallel) the ingestion process is much faster, delivering raw information considerably faster than ETL.
When more intel is better intel.
The advantage of turning data into business intelligence lay in the ability to surface hidden patterns into actionable information. By keeping all historical data on hand, organizations can mine along timelines, sales patterns, seasonal trends, or any emerging metric that becomes important to the organization. Since the data was not transformed before being loaded, you have access to all the raw data. Typically, cloud data lakes have a raw data store, then a refined (or transformed) data store. Data scientists, for example, prefer to access the raw data, whereas business users would like the normalized data for business intelligence.>
When you know you will need to scale.
When you are using high-end data processing engines like Hadoop, or cloud data warehouses, ELT can take advantage of the native processing power for higher scalability.
Both ETL and ELT are time-honored methodologies for producing business intelligence from raw data. But, as with almost all things technology, the cloud is changing how businesses tackle ELT challenges.
Best Practices for Managing Data Quality: ETL vs ELT now.
Benefits of using ELT in the cloud
The cloud brings with it an array of capabilities that many industry professionals believe will ultimately make the on-premise data center a thing of the past. The cloud overcomes natural obstacles to ELT by providing:
- Scalability — ELT functions in older, on-site data centers could quickly overwhelm local processing and storage power, requiring expensive hardware upgrades and scheduled downtime while fixes are deployed.
The scalability of a virtual, cloud infrastructure and hosted services — like integration platform-as-a-service (iPaaS) and software-as-a-service (SaaS) — give organizations the ability to expand resources on the fly. They add the compute time and storage space necessary for even massive data transformation tasks.
- (Almost) seamless integration — Because cloud-based ELT interacts directly with other services and devices across a cloud platform, previously complex tasks like ongoing data mapping are dramatically simplified. What were once monumental challenges can be rendered as simple, interactive graphical interfaces that provide all the critical information at a glance.
- Open source — The best ELT solutions harness the power of living, open-source cloud platforms, which work collaboratively to push improvements, security, and compliance across the enterprise. Open source ELT results in global, professional communities eliminating data challenges as, or even before, they arise in your network.
- Lower cost of ownership — When ETL was the standard, expanded capability meant expanded costs. Beefing up processing and storage resources was a one-way expense, requiring big investments in hardware. The limitation of this approach was that organizations had to pay upfront for the most power they would ever require, but day-to-day operations used only a fraction of that capability.
Like most cloud services, cloud-based ELT is pay-as-you-use. This means that compute and storage costs will run higher when huge ETL jobs are processing, but drop to near zero when the environment is operating under minimal pressure. Averaged annually, this results in far lower total cost of ownership — especially when coupled with no upfront investment.
In these and many other ways the cloud is redefining when and how companies are localizing business intelligence productions.
ETL vs ELT: Defining the Difference now.
Overcoming common ELT challenges
To get a job done right, every organization relies on the right tools and expertise. As with any task, mistakes early on in the production process are amplified as the project grows, and there are a few common pitfalls that can undermine any ELT architecture.
- Security gaps — Moving petabytes of data, and making it accessible to all applications and users, comes with security risks. A safe design approach integrates security at all levels of the enterprise, ensuring a corrupted or compromised data set can’t infect data warehouses.
- Compliance complacency — Growing compliance frameworks like HIPAA, PCI, and the GDPR create increased pressure on organizations to perform mandatory audits and prove standards are met. Any ELT approach must be designed with compliance in mind to prevent running afoul of national and international regulations.
- Resource bloat — The advantages of having warehouses of data to mine for business intelligence come with one obvious drawback: all of that data must be maintained. Cloud providers and pay-as-you-use pricing make harnessing big data more affordable than ever, but even tiered storage pricing can get expensive without a plan for managing and prohibiting the endless growth of working information sets.
- Absent data governance — Just as the security of the data moving through an ELT process is critical, so are the five Ws of data governance:
-Who controls master data management in the organization?
-What data is gathered/kept?
-When are overviews and audits performed?
-Where is data stored?
-Why are ELT efforts positively impacting business performance?
Answering key questions in advance creates responsible ELT practices and sets businesses up for rich harvests of information that daily impacts the bottom line.
Data integration with ETL and ELT
Integrating your data doesn’t have to be complicated or expensive. Talend Cloud Integration Platform simplifies your ETL or ELT process, so your team can focus on other priorities.
With over 900 components, you’ll be able to move data from virtually any source to your data warehouse more quickly and efficiently than by hand-coding alone. We’ll help you reduce your spend, accelerate time to value, and deliver data you can trust.
Download a free trial of Talend Cloud Integration and see how easy ETL can be.