Building ‘Houses’ in the Cloud
In my IT career I have had the opportunity to work on many great Data Management projects, ranging from simple extract, transform and load (ETL) assignments that support operational systems like CRM, SFA, and ERP, to simple Data Warehouses. I have been on some very impressive Master Data Management (MDM) and Data Quality projects for some of the top companies in their sectors, including both ETL and real-time Data Services integration patterns. But, I have taken a break from that, and now work for a company that provides tools to help you build the very data fabric that all enterprises need to be successful.
Talend recently launched a new product in the integration Platform-as-a-Service (iPaaS) space that makes it even easier for customers to build and deploy their integration patterns in the cloud where infrastructure and hardware aren’t necessary. This is a completely hosted Data Integration platform in the cloud, and if all your source and targets are in the cloud, then your entire solution can be hosted and run in the Cloud.
As part of my new role at Talend, I am fortunate enough to have early access to many of the products and am required to become an early expert in order to train other technical professionals in the company. Sometimes this can be a blessing and sometimes it’s a challenge, as I can be dragged into some really hairy projects. In this case I was excited when our CMO, Ashley Stirrup, came to me and asked if I would help build our own internal cloud-based Customer Data Warehouse (CDW). I was very excited to help build a complete data warehouse entirely in the cloud.
End-to-End Sales and Marketing Data Integration
The concept of the CDW was pretty simple really, the executives wanted to see and measure the effectiveness of all Marketing and Sales activities from beginning to end for all our customers. The secondary project objective was to build the entire CDW using cloud technologies including the Talend's Integration Cloud Platform. The three sources were, Marketo (Marketing Automation), Salesforce.com (Sales and Campaign Operations) and Netsuite (Billing and Invoicing)—all Software-as-as Service (SaaS) platforms. We employed the assistance of our partner, full360, to build the Data Warehouse in Amazon Web Services (AWS) Redshift with the online edition of Tableau for the visualization layer. The partner had a lot of experience with Talend's on-premise tools but like everyone was new to the Cloud edition. It was my job to assist with the migration of the traditional Talend jobs to the cloud—a process which we referred to as: "Cloudify" the flows.
The process was very simple and it took next to no time to build using all the different components and connections Talend provides. We built the flows and tested the overall process from our local development studios. This included a full batch control process within the Redshift tables to assure all extracts from the sources out to AWS S3 were successful before loading data to the production Reporting tables on Redshift. We also used several Data Quality Actions. Actions are "Predefined Integration Patterns" used in a data flow in the cloud, to cleanse data quality issues. Once these were defined, I saw many steps that were excellent candidates to be reused for Actions, such as the Batch control process that needed to retrieve a batch ID before every flow and then update a table at the end that the process was successful or report a failure. I turned this into a ‘simple Action’ that all Flows together in sequence in order to keep the entire process in check.
The best part of this CDW process was that all my testing and production deployment was a matter of a "right click" and deploy and I was done! I didn't have to call up my favorite hardware guys and order new integration servers or database servers because all the infrastructure was created for me in the cloud. The CDW process really is as simple as doing a right click and deploy to the cloud and I am ready to test, schedule, and run my integrations in production for my completely hosted Data Warehouse.
Overall, building my first, completely cloud-hosted, Data Warehouse was a great experience! Of course many of you still have data sources that are not in the cloud and you will need that on premise functionality that Talend Integration Cloud offers, but in this CDW project it was very fulfilling to have an entire project where I didn't need to involve the Infrastructure team or worry about securing space in some data center. Finally, it’s important to note that as the requirements for the CDW grow, I know that AWS and Talend are both capable of scaling to meet the need with very little effort.
Final Data Warehouse Result