Today, Docker or Kubernetes are obvious choices. But, back in 2015, these technologies were just emerging and hoping for massive adoption. How do tech companies make the right open source technology choices early? As a CTO today, if you received an email from your head of Engineering saying, “Can we say that Docker is Enterprise production ready now?,” Your answer would undoubtedly be “yes”. If y...
READ ARTICLE
How to Develop a Data Processing Job Using Apache Beam – Streaming Pipelines




In our last blog, we talked about developing data processing jobs using Apache Beam. This time we are going to talk about one of the most demanded things in modern Big Data world nowadays — processing of Streaming data. The principal difference between Batch and Streaming i...
READ ARTICLE
How to Develop a Data Processing Job Using Apache Beam – Streaming Pipelines
In our last blog, we talked about developing data processing jobs using Apache Beam. This time we are going to talk about one of the most demanded things in modern Big Data world nowadays – processing of Streaming data. The principal difference between Batch ...
READ ARTICLE
Making data-intensive processing efficient and portable with Apache Beam
The appearance of Hadoop and its related ecosystem was like a Cambrian explosion of open source tools and frameworks to process big amounts of data. But companies who invested early in big data found some challenges. For example, they needed engineers with expert knowledge not only on distributed systems and data processing but also on Java and the related JVM-based languages and tools. Another issue was that the system constraints at the time were constantly evolving as new systems appeared...
READ ARTICLE

Download The Definitive Guide to Data Quality
now.
Download Now
10 Things You’re Doing Wrong in Talend
…and how to fix them! We’ve asked our team of Talend experts to compile this top ten list of their biggest bugbears when it comes to jobs they see in the wild – and here it is! 10. Size does matter Kicking off our list is a common problem – the size of individual Talend jobs. Whilst it is often convenient to contain all similar logic and data in a single job, you can soon run into problems when building or deploying a huge job, not to mention trying to debu...
READ ARTICLE
How to Develop a Data Processing Job Using Apache Beam
This blog post is part 1 of a series of blog posts on Apache Beam. Are you familiar with Apache Beam? If not, don’t be ashamed, as one of the latest projects developed by the Apache Software Foundation and first released in June 2016, Apache Beam is still relatively new in the data processing world. As a matter of fact, it wasn’t until recently when I started to work closely with Apache Beam, that I loved to learn and...
READ ARTICLE
7 Emerging Open Source Big Data Projects that will Revolutionize Your Business
Open source software (OSS) just celebrated its 20th anniversary and not only does the community have a lot of milestones to celebrate, but also a lot to which they can look forward! OSS continues to disrupt the status quo in groundbreaking ways, but it’s also becoming increasingly mainstream. Thus, if you’re an IT leader of any-sized organization, you should be thinking about and planning for how to incorporate OSS into your infrastructure. Among the hundreds of popular open so...
READ ARTICLE
Open Source: 20 years of Innovation and the Best is Yet to Come
In 1998, Netscape decided to release their source code in an effort to attract new users to their product and new developers who could easily integrate applications with the browser. At the same time, there seemed to be a groundswell around a culture of open and collaborative development, with legacy software companies beginning to ack...
READ ARTICLE
Dive Deeper
Learn how to empower your business users with easy-to-use, self-service integration tools.
Learn MoreThe Cloud of Yesterday, Today, and Tomorrow
Cloud computing in the form we understand today started around 10 years ago, with the launch of Amazon Web Services (AWS). This was the first commercially viable option for businesses to store data in the cloud rather than on-premise and acted as a shared service for anyone connecting to the platform. Early stage cloud computing was certainly more technical than it is now. However, no more than managing a data center – something that IT departments at the time were well used to. Any...
READ ARTICLE
The Paradise Papers: How the Cloud Helped Expose the Hidden Wealth of the Global Elite
In early 2016, the International Consortium of Investigative Journalists (ICIJ) published the Panama Papers –one of the biggest tax-related data leaks in recent hi...
READ ARTICLE