ETL with Salesforce

The Salesforce platform, along with its range of cloud service offerings, continues to be a major player in the CRM landscape. Salesforce’s popularity is due in large part to its functionality and versatility, as well as its capacity for integration with other applications and platforms. In this article we will discuss those integrations, as well as what you need to know to prepare for ETL with Salesforce.

What is Salesforce?

Salesforce is the leading cloud-based customer relationship management (CRM) platform in the world, and one that boasts 19.6% share of the market. Salesforce offers a wide range of applications for managing businesses processes including sales, customer service, marketing, and ecommerce. Some of its most well-known capabilities include storing and managing customer accounts, marketing data analysis, and business intelligence.

Salesforce pioneered the concept of software-as-a-service (SaaS) when, in 1999, they began offering CRM solutions in the cloud. Until then, most companies chose to create in-house CRM products that were expensive and took years to complete.

Two decades later, Salesforce applications are now deployed across a range of industries and verticals, from small startups to huge corporations including GE, Vodafone, Coca Cola, and Air Asia. Today, Salesforce products are widely used in the non-profit, government, and private sectors. Some of the most well-known Salesforce applications include Sales Cloud, Marketing Cloud, and Service Cloud.

Most users rely on more than one Salesforce application. Schneider Electric is a typical example: the company began using a single Salesforce CRM solution in 2010, but has since brought an additional 30 Salesforce applications into its service.

ETL with Salesforce

So now that we’ve covered Salesforce, let’s talk about data integration and why it’s important for any company looking to streamline, simplify, or consolidate their data sources.

Simply put, data integration is the process of combining data from different internal and external sources into a single, centralized repository. For example, a company may have customer data stored in a legacy database, manage inventory data with a third-party platform, and collect still other data through other applications that, at one time, may have been the best option.

Situations such as these are not uncommon. As companies grow and change, so too do their software and data needs, and a strategy that once made sense needs to be revised. That’s where data integration comes in.

At the heart of data integration is the ETL (extract, transform, load) process. Data integration begins with extracting data from various sources and moving it into a single data warehouse. (For companies and organizations who do not use a data warehouse, the process is similar, although the data will be integrated directly into the source.) In order to facilitate the integration process, Salesforce features a range of interface points including REST, SOAP, and BULK API. These interfaces connect with and extract large volumes of data at a time.

During the transformation stage, data is cleansed, validated, de-duplicated, organized, and standardized. At this point, all of the various sets of data are now in conversation with each other. Finally, the newly-converted data is loaded (or delivered) to its final destination.

Benefits of moving your data to Salesforce

Better insights

Data in a silo simply isn’t as useful as data that’s been liberated. Your Salesforce application contains important details about your customers; unless that data is integrated with your other platforms, you may be missing out on critical insights, patterns, or trends.

Increased efficiency

As an organization onboards new applications, data integration allows you to automatically pull data from the new application into the unified view. This saves time in reporting and analysis since there is no need to log in to multiple systems to gather data.

Improved data quality

Data integration leads to higher confidence in the data because conflicts and inconsistencies are resolved during the integration process.

Deeper data analytics

Data integration adds context to the data and makes it possible to see the big picture. It enables a range of reports and dashboards that can speed up analytics, and provides a path to the big data analytics that have become essential for businesses who want to stay competitive.

Data migration and integration — what’s the difference?

Before moving on, it’s important to understand the difference between data integration and data migration. The terms are sometimes used interchangeably, but they describe distinct, separate processes. They do, however, share some of the same implementation techniques.

Data integration is the process of combining data from multiple sources, both internal and external, into a common view. The objective here is to create a “single version of truth” that leads to better business insights. For example, when Salesforce is integrated with the other marketing systems, it can lead to better lead generation and channel strategies. Data integration describes a unified set of smaller processes. In other words, data integration is more of a big picture concept.

Data migration, on the other hand, is the act of moving data from one system to another. When a company decides to change its existing CRM system to Salesforce, or when it decides to upgrade from an earlier version to a recent one, it has to migrate all the data from the current system to the new one. Data migration is a more specific process than that of data integration.

Common integration methods

Up to this point, we’ve taken a high-level view of the data integration process and how it combines data from multiple sources into a single view and source. Some of the different methods of data integration include

Manual data consolidation

Your data needs to be moved into a common repository, and manual consolidation is one way to do it. This part of the process usually requires conventional ETL, although some companies may use custom inbuilt tools. Manual consolidation may work well for smaller, simpler data sets that do not require deep cleansing, but it can prove to be too time-intensive and error-prone for larger data sets. Also, the absence of real-time data limits its usefulness.

Data propagation using applications

The objective here is to propagate data from individual applications to a common data warehouse, and the integration logic to achieve that sits in individual sales applications. Rather than a common tool or approach to transfer data into the warehouse, every application takes responsibility to move its data to the central store. This method is usually adopted because there may be some heavy cleansing and manipulation required of the data, and the application is best positioned to understand and perform those operations.

However, this approach is difficult to maintain as applications are susceptible to changes, which often means that the integration logic needs to be rebuilt or adjusted.

Data propagation using middleware

Similar to the previous method, here too the objective is to propagate data to a warehouse. However, this abstracts the integration logic from applications and shifts the responsibility to middleware. For example, a publish-subscribe mechanism configured between Salesforce and the data warehouse ensures that every time there is an update, an event is triggered to publish the data automatically to the warehouse, keeping it up to date.

Even when applications undergo changes, the middleware continues to work as a bridge transferring data to the central store.

However, for this method to work, there has to be an implementation layer that manipulates and transforms the data in a format that the consumer (warehouse) understands. There may also be a slight lag between the publisher and the consumer for high volumes, which may not work for some applications that demand real-time data.

Data virtualization

In virtualization, data is not pulled out and stored in a common repository. However, it provides a mechanism for data from multiple sources to be accessed and viewed through a front-end.

The technique has the advantage of not having to create and manage a warehouse and does not suffer from data lags. It’s perfect for highly secure applications that do not allow data to be stored somewhere else. However, it restricts the scope of what can be done with data. For example, data lineage is difficult to maintain and hence deriving insights from data can prove to be a challenge. Also, the front-end is not lightweight, as it’s constantly querying various data sources, adding loads on those databases.

Challenges to data integration

In the Progress-Dimensional Research survey on Salesforce users, 54% of respondents indicated that application and data source integration is their primary challenge. Let’s look at a few factors why data integration continues to remain a challenge:

Complexity of systems

Bringing together data from a huge number of systems that use different technologies and span different locations can be quite a complex task. The sheer size, volume, and complexity of this process requires flawless planning and coordination.

Data mapping

As data fields tend to be stored with different names and types in data sources, it’s a difficult task to map every field to the destination system. Some of the data sources could also be legacy systems with substantial data gaps. Solving these issues requires collaboration between business and technical stakeholders, who have a deep understanding of data.

Finding the right experts

Integrating Salesforce with a data warehouse needs experts across different fields such as Salesforce, CRM, data warehouses (or data lakes), and integration technologies. Putting together such a team and ensuring that they communicate effectively can be challenging.

Formulating a uniform data integration strategy

Salesforce is a cloud-native application. However, there may be other sources that exist either in the cloud or on-site. Consolidating this mix of cloud and on-premise sources can mean different approaches to integrating their data. However, divergent approaches can lead to inconsistent handling of data which can in-turn compromise data quality. Creating a uniform strategy that ensures data integrity and synchronization despite the individuality of systems can prove to be a tough ask.

Ensuring continuous data integration

Data integration isn’t a one-time task. The initial effort to bring in data is significant, but, additionally, there needs to be continuous effort to automatically update the common store when changes occur.

Despite these challenges, Salesforce data integration remains an important part of an organization’s strategy to achieve a unified view of the data. Having a clear integration strategy and the use of a data integration tool help to overcome these roadblocks.

Before you integrate: a checklist

Your company’s unique character, needs, and strategy will determine exactly how you’ll go about integrating your data. In order to know which is the best way forward, there are some steps you can take before you begin:

Identify your stakeholders

These may include Salesforce experts, data engineers, customers, and other specialists with a comprehensive view of the organizational data.

Establish a collaborative platform

How will your team of stakeholders share information and plan?

Recognize your resource constraints

What are the limits on budget, time, and personnel?

Know your technical constraints

Does your data need to be available in real-time, or can it be pulled on-demand or in batches?

Decide on your integration approach

Which works best for your company — manual consolidation, data propagation to a warehouse using applications, data propagation to a warehouse using middleware, or virtualization?

Map it out

Match Salesforce data fields to your own.

Choose your method

Will you use APIs or point-and-click methods to manage the integration?

Select your tool

Engage a data integration platform to plan, simplify, and complete your integration.

Choosing your Salesforce integration method

There is no one-size-fits-all approach for Salesforce data integration. While some methods rely on a data warehouse, some may not depend on common storage. While some work with manual methods, some are automated. Some use application logic, some use middleware, and some depend on ETL. Some may have a hybrid approach — for example, automated data updates combined with manual validations by backend sales teams to confirm data validity.

The final solution that an organization arrives at depends on many factors — inclination to build a data warehouse, availability of resources such as time and money, size of datasets, the need for real-time availability of data, and so on.

Simplify your integration

A data integration tool helps to simplify the complexities of the integration process by providing an automated mechanism which consolidate data from multiple on-premise and cloud sources. Such a tool enables not just quicker ETL operations but also ensures continuous, real-time updates to the centralized data store. By doing so, it minimizes or completely eliminates human intervention, reduces errors, saves time, and thereby increases productivity and data quality.

Moreover, the tool makes it easier to scale as more data sources are added. Rather than having a fragmented approach that has a separate integration method for each source, the tool offers a consistent solution across the board. It also allows the various stakeholders to communicate effectively using a shared framework.

Finding your Salesforce data solution

When it comes to data integration, your company is aiming at a moving target. New applications come into the picture, business needs change, and stakeholders revisit their priorities. Yet the ultimate goal remains the same: you want to integrate your data to extract its maximum value by delivering keen business intelligence. 

Talend Open Studio for Data Integration is an open source ETL tool that integrates Salesforce data with your existing data warehouse and synchronizes data between systems. Its unified Eclipse IDE provides data integration features such as map, aggregate, sort, enrich, and merge data and provides the tools to develop and deploy data integration jobs. And it does it ten times faster than hand coding.

Take the guesswork out of your Salesforce migration. Download a free trial and see the future of your Salesforce data.

Ready to get started with Talend?