`

Month: June 2016

Data Preparation, to the Moon and Beyond

  According to a University of Southern California study, less than a decade ago, overall digital information located on storage devices, for the entire world, reached 300 Exabyte. To figure out what it’s like, just imagine that it would require over 400 billion CD-ROMs. If you were to build a stack with it, you would go over the distance from […]







Data Prep 101: Diving into Enterprise Features

  A few months ago I wrote a blog about the exciting, new open source Data Preparation tool and all the great quick action you can take on your data. But, it gets so much better!  Where the single user desktop Free Data Preparation tool stops (as well as many of the not so free […]







IoST and IoUT: Why They Matter for IoT Growth

  Recently, I was fortunate enough to find myself in Munich, Germany during a trip to visit with family and discovered that just north of town is the city of Ingolstadt, which is home to the Audi factory. Being somewhat of a gear-head and very much an Audi fan, I decided to take the factory […]







Complex Generation and Distribution of Documents with Talend

  In this post, I would like to cover the possibilities we have to build complex document generating systems. Actually formally a domain of expensive software like Adobe Publisher. With Talend Open Studio and JasperReports you are able to create such a system. Introduction / Conceptual Formulation One of Germany’s biggest online vehicle dealers gets […]







The Evolution of ETL and Continuous Integration

   In the beginning of ETL…. When I started my IT career over 15 years ago I was nothing more than a “Fresh-out” with a college degree and an interest in computers and programming.  At that time, I knew the theories behind the Software Development Life Cycle (SDLC) and had put it to some practice […]







Moving Data to the Coalface to Achieve Business Success

  Self-service data preparation, which we define as empowering business workers and analysts to prepare data for themselves prior to analysis, is often cited as the next big thing. In fact, Gartner predicted last year that “by 2018 most business users and analysts in organisations will have access to self-service tools to prepare data for analysis“. The great […]







How to Aggregate Clickstream Data with Apache Spark

  As part of a POC of Talend v6.1 Big Data capabilities, I was asked by one of our long-time customers, a major e-commerce company, to present a solution for aggregating huge files of clickstream data on Hadoop. The input data was a giant clickstream file (larger than 100GB, or even terabytes) from a website. Our […]