How to ETL SaaS data to a data warehouse using Stitch
If you're using a cloud data warehouse, you're using a platform with the ability to scale quickly to handle any processing load. That means you don't have to transform your data in the ETL pipeline before loading it, which many businesses that use on-premises data warehouses do to avoid taking CPU cycles away from users running analytics. You can extract and load raw data, then run transformations and do modeling in the data warehouse. This makes the data pipeline simpler, and potentially opens it up to a wide variety of data sources.
Stitch Data Loader is a great way to get raw data from more than 100 SaaS and database sources into a cloud data warehouse, where it can be used for reporting. Let's walk through the process. For this example, I'll use data from Invoiced, a cloud-based accounts receivable platform automates billing tasks like sending out invoices, following up with late-paying customers, and reconciling incoming invoice payments.
Getting started with Stitch Data Loader
Stitch makes extracting data from a source and loading it into a data warehouse easy. To get started, visit the signup page, enter your email address, then enter your name and a password.
Add an integration
Next, add the data source as an integration within Stitch. Click on the Invoiced icon to get started:
Enter a name for the integration. The name will display on the Stitch dashboard for the integration and will be used to create the name in your destination. You must also enter an API key — a unique string that Stitch uses to authenticate to the data source — which you can generate by following the documentation.
The other two settings ask how much historical data you want to replicate to your data warehouse and how often you want to replicate new data.
When you click Check and Save, Stitch displays another screen that lets you choose what tables you want to replicate.
Each time you check a table, Stitch will display all the fields in the table, so you can choose the ones you want.
When you're done with one table, click the name of the integration in the breadcrumb bar to return to the previous screen, where you can select additional tables. When you have all you want, click Finalize Your Selections, and voilà, your integration has been added. All new fields and records of the types you've selected will be replicated to your data warehouse — but first you have to connect the data warehouse you set up to Stitch as a destination.
Selecting a data warehouse destination
Stitch supports all the major cloud data warehouses. If you don't already use one, we have some advice on choosing one that meets your needs. For illustration purposes, let's suppose your organization uses Amazon Redshift.
Clicking on the Redshift icon brings you to a screen where you can enter your credentials:
Now all the pieces are in place, and the data is ready to flow.
When you visit your Stitch dashboard, the status for your new integration may show as pending while Stitch schedules the initial replication job. If you refresh the screen after a few minutes the status will change to active.
Now you can add integrations from other data sources. The Stitch documentation walks through the process for each one.
That's all there is to it. An ETL tool like Stitch makes it simple to move data from SaaS sources into a data warehouse.