News from AWS re:Invent – How do you solve the complex data problem?


Yesterday at Amazon re:Invent, Werner Vogels, Amazon’s Chief Technology Officer outlined a vision for a modern data architecture that spans data ingestion, lifecycle management, data governance, orchestration and job scheduling.  This vision included the announcement of Amazon Glue. Glue is really two things – a Data Catalog that provides metadata information about data stored in Amazon or elsewhere and an ETL service, which is largely a successor to Amazon Data Pipeline that first launched in 2012.

Anyone familiar with Talend knows that we’ve been executing toward a similar vision for some time with our platform and solutions such as Talend Big Data Integration and recent innovations in Talend Data Preparation. One of the key differences with our vision is that whereas Amazon is looking to meet the needs of their platform users, Talend is addressing user requirements across a number of platforms, including multiple clouds, Hadoop, SaaS applications and traditional data warehouses.

While we share a similar vision, Talend and Amazon are addressing different needs within the data management market.  Amazon is approaching the problem from the perspective of a platform provider working to simplify the development of custom applications on their platform.  The easier it is for developers to build applications on AWS, the more value customers will get and the more they will use the AWS platform. Our focus is on the data integration developer that is solving more complex data integration problems.  These challenges typically span many data sources within and outside of Amazon and require deep, rich transformation, cleansing and governance capabilities. 

Amazon Glue is focused on Python developers hand coding applications on top of Amazon.  It provides useful tools aimed at streamlining data movement on top of the Amazon platform.  Its catalog tool gives developers visibility into where data lives and its ETL service allows developers to generate some starter Python code that they can then edit in their favorite IDE such as PyCharm.  While Amazon Glue should be a good productivity boost for developers building analytic applications on top of EMR or Redshift, it is not intended to address the deeper data integration needs of a dedicated data integration developer.

Our view is that over the next decade, enterprises will need to solve data challenges in a deeply hybrid world, where different workloads will run on a given platform based on the services provided. It’s also clear that many enterprises will hedge their bets on being too dependent on any single platform and use a mix of platforms such as AWS, IBM Watson, GE Predix, Salesforce, SAP, Google Cloud and Microsoft Azure.

Talend users are typically knowledgeable in Java, not Python, and they have a far deeper set of integration needs.  Our users will typically need to connect to tens or hundreds of data sources, both on-premises and in the cloud.  For example, Lenovo is using Talend to connect to 60 different data sources, including web logs, e-commerce, customer service and social media data.  Once these developers have access to this data, they typically need to do far richer data transformation.  For example, with EMR (Hadoop), Amazon would expect developers to program their data quality rules by hand in Python.  With Talend, we provide them with pre-built data quality components that run natively inside EMR or any other instance of Hadoop.

In the long run, we expect that Amazon will continue to expand its services, but with a focus on the analytic application developer, while Talend will continue to specialize in the deeper integration challenges.  Solving these deeper problems will require strong multi-cloud and on-premises capabilities, continued innovation around data quality, including business-friendly user interfaces for data preparation and rich data governance and lineage capabilities.  These are the requirements we are hearing from our customers and where we plan to focus our product development efforts.   

Amazon is a key partner for Talend.  In fact, earlier this year we announced that Talend is “All-in” on Amazon.  There are several opportunities for Talend and Amazon to work together to improve how these two types of developers collaborate.  Our first opportunity is likely to be integrating Talend’s data management tools on top of Amazon’s data catalog, and there may well be more mutually beneficial opportunities in the future.

Join The Conversation


Leave a Reply