I am very excited to introduce Talend Data Streams, a brand new, cloud-native application that enables you to get streaming data integrations up and running in minutes all while providing unparalleled portability powered by Apache Beam.
Why Talend Data Streams?
One of the biggest challenges that businesses face today is working with all types of streaming paradigms as well as dealing with new types of data permeating everywhere from social media, web, sensors, cloud and so on. Companies see real-time data as a game changer, but it’s still a challenge to actually get there.
Take IoT data for example. Data from sensors or internet connected “things” is always-on, the stream of data is non-stop and flowing all the time. Typical batch approach to ingest or process this data is obsolete as there is no start or stop of the data.
Different devices also produce heterogeneous data formats. For example, there could be hundreds of sensors on a single wind turbine to monitor and collect data about oil level, the position and sway of the tower, pressure on the blade, temperatures, and so on. The sensors themselves could even be all different firmware or produced by different manufacturers. There is often no standard for IoT devices. And because of the use of mixed devices, the schema of data is prone to change unpredictably, and that could easily break the data pipelines. If you do get through that though… IT still needs to deliver the data to business owners.
In a recent data scientist survey, over 35% data scientists reported the unavailability of data and the difficulty to access that data is a top challenge for them.1 Many business users would likely agree and often go to an ad hoc approach to work with their cloud applications and data sources when IT can’t.
Introducing Talend Data Streams
As we saw this scenario play out over and over again with customers and prospects, we knew we could help. That’s why we built Talend Data Streams. So what is it?
Talend Data Streams is a self-service web UI, built in the cloud, that makes streaming data integration faster, easier, and more accessible, not only for data engineers, but also for data scientists, data analysts and other ad hoc integrators so they can collect and access data easily.
It is built with the goal to help our customers further close the gap between IT and business team, so they can enable more users with more use cases.
So what makes Data Streams so unique? Here are a few highlights I really wanted to share with Talend users:
The live preview in Talend Data Streams allows you to do incremental data integration design, which we call “continuous design”.
You no longer need to design the whole pipeline, compile, deploy, run, and then test and debug to see if it actually works. It is similar to the Read-Evaluate-Print-Loop concept often used in data science. You can see your data changes in real time, at every step of your design process, in the exact same design canvas. This will dramatically reduce development time and help to shorten the cycle to design.
Talend Data Streams is completely schemaless. And that brings benefits for both design time and run time.
Designers can create and refine pipelines more easily because the schemas are dynamically discovered, and enforcement is only optional. Pipelines are also more resilient to schema changes. For example, imagine a scenario where you are streaming from a message queue. Several message structures may co-exist like sensor and machine. Schemaless allows those pipelines to automatically adapt to multiple data variants during ingestion, as opposed to creating as many pipelines as there are variants.
Unparalleled Portability with Apache Beam
Talend has long been a leader in big data and our open source approach allows us to help our customers run on the best data framework of their choice, and also to help them move to the next best framework when it comes around. A typical example is when we turned our code generator from MapReduce to Spark. But now we are pushing this model to a whole new level by embracing Apache Beam.
Apache Beam is an open source framework led by Google,Talend, data Artisans, PayPal, and others. At its core, Apache Beam is an abstraction layer, that provides a portable data pipelining framework. It decouples design with runtime, and merges batch and streaming in a unique data pipeline semantics. Because Talend Data Streams is powered by Apache Beam, it empowers customers with unparalleled portability. [Click here to learn more about Apache Beam]
So you could plug the same pipeline on a bounded source, like a SQL query, or an unbounded source, for example, a message queue, and it will work as a batch pipeline or a stream pipeline simply based on the source of data. And beyond that, you can even choose to run natively in the cloud platform where your data resides. Truly achieving “design once and run anywhere”, and get portability across multiple clouds.
Embedded Python Component
Last but not least, we wanted Talend Data Streams to be an app that could embrace the data scientist and coder community. So we embedded a Python component to allow them to script or code with Python for customizable transformations.
Looking to bridge the IT & Business Gap, and put more data to work?
What’s even better with Data Streams, is that it’s not a standalone app or a single point solution. It is part of Talend Data Fabric platform to help companies break down barriers and collaborate like never before, delivering data they can trust, and make data a team sport. How so?
Data pipelines, data sets, and metadata can all be shared across the Talend platform and with other apps. It helps dramatically increase the reusability of your data, but more importantly bring your IT and business teams together and achieve collaborative data management and better governance.
For ad hoc integrators, users like data scientists can ingest data they need more easily without going to IT all the time.
And of course, IT gets all the other benefits of Talend Data Fabric, to be in control of data usage, so it’s easy to audit, ensure privacy, security and data quality, and so on.
We are excited to bring a free edition of the product to the market via AWS Marketplace, so anyone with an AWS account can launch and use it immediately, with zero software cost. You can find more details of the product features on https://www.talend.com/products/data-streams/data-streams-free-edition/
Launch now: www.talend.com/datastreams-aws/
- The State of Data Science & Machine Learning 2017 https://www.kaggle.com/surveys/2017