Month: June 2016

The Evolution of ETL and Continuous Integration

In the beginning of ETL…. When I started my IT career over 15 years ago I was nothing more than a “Fresh-out” with a college degree and an interest in computers and programming. At that time, I knew the theories behind the Software Development Life Cycle (SDLC) and had put it to some practice in […]

How to Aggregate Clickstream Data with Apache Spark

  As part of a POC of Talend v6.1 Big Data capabilities, I was asked by one of our long-time customers, a major e-commerce company, to present a solution for aggregating huge files of clickstream data on Hadoop. The input data was a giant clickstream file (larger than 100GB, or even terabytes) from a website. Our […]