Talend Cloud How-to #2: Moving Data from Salesforce to Snowflake

Now that we have introduced you to Talend Cloud, we are going to talk about how to use the cloud integration platform to get data from here to there. In this example, we will move data from Salesforce to a Snowflake cloud data warehouse.

We will show you how to:

  • Set up a job in Talend
  • Run everything in a remote engine connecting to Salesforce
  • Pull the data through the remote engine
  • Write to the Snowflake cloud data warehouse

…All within our Amazon Virtual Private Cloud (VPC).

Creating the Job to Move the Salesforce Data

Now that we have explained the Talend Open Studio and walked you through the Talend Cloud UI in Part 1, let’s start our job by dragging the existing metadata object from the repository “Account” from Salesforce into the open palette in the middle. On the screen that pops up, we select the input option – because we want to pull our specific set of Salesforce CRM data. After that is selected, we see an “Input” component on our open palette.

talend cloud

Now we go to the bottom part of the screen to configure the component. We want a very specific set of fields, so we use a manual query that we have predefined.  To do this, we copy that and paste the query into the “Full SOQL query string” field. 

Next, we select to “Guess schema” based off the query, and the results appear in the “Full SOQL query string” field. Then we check “Edit Schema” to view the query or the schema and see exactly what we have pulled. What we see are the main fields we want to pull out of Salesforce and put into the Snowflake cloud data warehouse, which is order and account information. It looks good to us, and we can move on to the next step.

The tMap and Salesforce Data Lookups

Next, we do a quick search to find the  tMap.  The tMap component is a core feature of Talend Open Studio and is primarily used for mapping input fields to output fields and transforming the input data. When the tMap opens, it appears on the middle palette, and we map it to our “Account” object, which contains our Salesforce data. We have now successfully connected the output of Salesforce into the tMap.

salesforce cloud

Then we want to find the main contact information that we want to load to the Snowflake cloud data warehouse. We go to the left side of the screen to the metadata repository under “Salesforce” and pull in “Contacts” to the middle palette. We do this as a lookup: for every order we have the account, but we also want to know who the main contact is. We use a predefined query that limits the number of columns, because most of the time every object in Salesforce has hundreds of columns, and this is a fast and easy way to limit the columns that are pulled.

Download The Cloud Data Integration Primer now.
Download Now

We hit “Guess schema” to apply our predefined query into the defined columns. Below you can see we have first name, last name, account ID...the basic things we want. We drag that to our tMap which is almost ready to go, but first we do a quick test. We search for, and find, “tLogRow” which pulls the output for us to see as a quick test, now we just need to connect the output.

schema

We go into our tMap to join the data from our first Salesforce input to our contacts by simply dragging the “Order AccountId” and down to the “Id”.  This is doing a basic lookup or a join between those two sources, so we will now get all the contacts for that particular account in the order.

Next, we pull our output over to the right side of the screen to the output schema by highlighting and dragging and dropping it over for very fast mapping of inputs to outputs. It’s easy to test the output schema we just created using the tLogRow.

Running a Test of Salesforce to Snowflake

To run this test, go to the “Job SFDC to Snowflake” tab at the bottom of the Open Studio screen and click “Run”. We see the data flowing in the job we just created, and now we know the join is successful, so we remove the tLogRow and complete the job.

flows

Next in the metadata repository, we will find the table in Snowflake that’s called “SFDC_CHEMICAL_ORDERS” –  this is where we want our Salesforce data to be written into Snowflake.  Then we go back to the tMap and see the schema of the table from Snowflake.

We click on “Auto map!” and do some quick data conversions: the quantity (below) in Salesforce is a double, but in the Snowflake cloud data warehouse, it’s a big decimal, so we go to the expression builder and use an example of our expression we have ready. Finally, we map the order date, the account name, and the product ID manually because they didn't auto map. We do this by dragging and dropping. Then, we want to see how it has populated into the Snowflake cloud data warehouse.

We hit “OK” and run this from the Talend Open Studio first and make sure it works, and it does, so we can move on to the next step: publishing to Talend cloud integration platform. To do this, we run the AWS VPC using the Talend Remote Engine which we pre-installed. The Talend Remote Engine is a small agent that you can install behind your company’s security walls, and it communicates to Talend Cloud through simple http protocols.  We return to the Snowflake database worksheet, where we can execute queries against our database. We see that we went from 820 to 984 rows, indicating we are inserting data into our table through Talend Open Studio. Now that we have built the job, we are ready to publish to the cloud.

We go back to the Talend screen, and to publish, we first save it, and right click “Publish” to connect to the cloud integration environment.

Then it asks us where we want to publish it to, and we choose our pre-defined environment – which is the production (“Prod”) environment (publishing straight to production is not best practice, but we are using it for this demo). We select it and hit “Finish”.  Then we watch as it builds the Java libraries and objects. Next we want to check out the results.

select salesforce             

Publishing the Job to the Cloud

We then return to Talend Cloud to publish our job to the cloud and automate it. Inside the cloud integration platform, we see “Flows”, “Operations”, “Manage”, and “Administration” on the left side of the screen. To see the job that we just published, we go into “Manage” and do a search to find the “SFDC to Snowflake” job we just published to the cloud.

We want to put our Salesforce to Snowflake job in an environment to run, so we go into “Flows”. We see our job has been published and is ready to go. There is a question mark next to it because it’s never been run or scheduled. We click on the job so we can do the final set up on it.

vpc

We are going to run this on our Virtual Private Cloud (VPC) and we’ll use a remote engine to get our job to our VPC that’s within AWS, so it’s in a secure network. We set up access so this process will read from SFDC and write to our Snowflake database within the VPC. We click into the “Type” drop-down to schedule our job, then hit “GO”.

snowflake

We are brought to an overview screen where we can see the activity happening and monitor the data, and now we are running inside the remote engine using the Talend cloud integration tool. In this UI, our hosted environment is kicking off the final process that reads from Salesforce and writes to the Snowflake data warehouse.  To prove it’s working, we should see our counts go up, and do a quick run on our query – and see more data has been added to Snowflake. 

Testing our Talend Open Studio Job

When we see that the job is running successfully, and once the job has completed we can go back to the Snowflake query tool and see the counts have incremented some more again.  We have successfully built a job in Talend Open Studio that pulls data from Salesforce and writes into a table in the Snowflake cloud data warehouse.  We tested and deployed to the Talend Cloud where we have successfully deployed to a Talend Runtime Engine installed on my AWS VPC.  

We hope we have demonstrated how fast and easy and fast it is to get your data from here to there with Talend Cloud. In this article, we moved Salesforce data to the Snowflake cloud data warehouse, but with Talend, data can move anywhere: ground to ground applications, custom applications, database to database. It can be cloud to cloud, or ground to cloud. Talend Cloud offers total flexibility and freedom to make the connections you want to make.

| Last Updated: August 8th, 2019