The Evolution of ETL and Continuous Integration


 In the beginning of ETL….

When I started my IT career over 15 years ago I was nothing more than a “Fresh-out” with a college degree and an interest in computers and programming.  At that time, I knew the theories behind the Software Development Life Cycle (SDLC) and had put it to some practice in a classroom setting but, I was still left questioning how it relates to the big, bad corporate world.  And by the way, what the heck is ETL?

Since then I have become VERY familiar with ETL and the broader scope of Data Integration and used the SDLC extensively through my IT Journey.  And what a journey it has been!  While the underlying concepts of ETL have remained unchanged (Extract some data from a source, manipulate, cleanse, transform the data and then load it to a target) the implementation of Data Integration has transformed into what we now call Continuous Integration or Continuous Delivery.  While the Software Development Life Cycle is cyclical in nature, it still has a beginning and an end.  When new requirements were necessary or a new project kicked off, a new, but separate Life Cycle was started.   Today, with the ever changing business climate and business analysts needing information immediately, there isn’t time to start a new project.  What used to be a 4 week engagement to design, develop, test and deploy a simple report now needs to be done literally overnight.  How can large corporations keep pace with their competitors, let alone a small company avoid being pushed out of the market, when the market landscape can change on a dime?

Free Trial >> Talend Big Data Integration with Ready-to-Run Hadoop Scenarios

Before you can understand how the IT industry has changed in the past 15 years, you have to know what it was like in what I call the “Dark Ages”, Pre-2k.  Working in the IT “Command Center” for one of the largest engineering and manufacturing companies in the US, I spent my days staring at a wall of computer terminals manually scanning for error messages from batch jobs that were manually kicked off by human-computer operators.  When I wasn’t busy in the Command Center, I spent my time in the “Data Warehouse” being the librarian to hundreds of thousands of plastic cassettes.  This is not to be confused with what we now call a Data Warehouse, this was literally a 10’s of thousands square feet brick and mortar warehouse that stored shelves upon shelves of plastic cassette tapes for storing data, which at any point in time could be called upon to be loaded into a “Silo” for data retrieval or backup.  Talk about slow and inefficient.  Back then the biggest question Business Analysts were asking their IT Department was “Can our systems handle the year ‘2000’?”.

A few years later we are past the Y2K scare and companies are finally catching on to the concepts of data integration, ETL and sharing information between systems.  It was the Age of Enlightenment.  There was just one problem.  All the solutions were siloed (little to no cross-platform or cross-application communication) and wildly inefficient.  Sure, if you were an All-Oracle shop or an All-IBM shop everything played nicely, but who could afford that?  In one of my first ETL Projects I spent 6 weeks single-handedly writing 2500 lines of an SQL Package to pull account information from an Oracle Applications data entry point, standardize the information – using MY OWN logic, because there were no Data Standardization tools – and attempting to match the information to a D&B number before loading to a reporting data warehouse.  SIX WEEKS!!  That doesn’t even include testing and deployment.  In today’s business landscape, not only should that simple process be done in an afternoon, it HAS to be done in an afternoon or your competition will leave you in the dust!

ETL In the Age of Big Data

But Lo-and-behold as the next few years come and go, we enter the Golden Age of ETL and Data Integration.  Applications finally catch up to the needs of the business – Applications that specialize in ETL, others that specialize in MDM and still others that specialize in ESB, Data Quality and even BI Reporting.  Businesses are sharing and reporting information like never before and making critical business decisions on in-house and/or third-party data.  These new applications become a gift from the heavens for large corporations to help them share their data amongst their many different systems and make sense of their ever increasing data volumes.  But they come with a hefty price tag.  On top of the already exorbitant seat license cost, if you want to be able to connect to your CRM, MDM, ESB applications or Reporting Database that’s an additional cost of 10k or more per year PER CONNECTOR.  The cost adds up fast!  Multi-million dollar licensing contracts were the norm.

On top of all of that, the SDLC Processes and Procedures where outdated.  It might take 3-6 months to build, test and deploy an ETL process to load 3rd party data into a data warehouse.  Then, due to the sheer volume of the data it would take a week simply to run the process only to find out the data quality was poor.  By the time you clean up your corrupted data warehouse and get accurate data for this month, the vendor is ready to send you the next month of data for analysis.  Companies became process-driven and by the time they had all the facts in front of them, they were REACTING to the market rather than pacing or predicting the market.

Dawn of the Data-Driven Age

So here we are in the middle of 2016 and it is the dawn of the Data-driven Age.  Not only is data at an all-time premium in terms of asset value, it comes from all sources and all directions.  It is critical in driving your business to success and if you are not a data-driven enterprise you will be left behind.  So the big question is, “How do I become a data-driven enterprise?”.  First, you have to re-evaluate your current Data Integration Solutions and second you have to re-think your current Software Development Life Cycle Procedures.  Data should be your number one asset and the tools, processes and procedures you use to collect, store and analyze that data should not limit your data capabilities.  Companies must have the agility to adjust, seemingly overnight, to the ever-changing business climate and technology trends.  

Talend Data Fabric coupled with its Continuous Integration development practice is your answer.  With a technology-agnostic framework, well over 900 connectors included, the broad support of an open-source community, and a subscription-based pricing model, Talend Data Fabric allows you to integrate all your sources of data (whether it be on-premises, in the cloud, traditional database, HDFS, NoSQL, etc.) through a single, unified platform, at a fraction of the cost of traditional Data Integration platforms.  Talend’s integrated Continuous Integration development practice allows IT to stay abreast of the latest industry trends and meet the demands of constant changes in business needs, keeping your business at the forefront of the market.

Prior to 2000, the number 1 question Business Analysts were asking their IT Departments was “Can our systems handle the year ‘2000’?”.  Sixteen years later, the number 1 question a CIO should be answering is “Are we a data-driven enterprise?”.  If the answer is “No.”, they should be looking at Talend Data Fabric for solutions.

Products Mentioned:

Talend Data Integration

Related Resources:

Sefl-Service Analytics – An O’Reilly Best Practices Report

Why All Enterprise Data Integration Products Are Not Equal



Leave a Reply