The Union of Real-Time and Batch Integration Opens Up New Development Possibilities

article in German
June 18, 2015 --

Hadoop's Big Data processing platforms feature two integration modes that correspond to different types of usage, but are being used interchangeably with increasing frequency. "Batch" or "asynchronous" mode enables the programming of typically overnight processing. Examples of using batch mode include a bank branch integrating the day's deposits into its books, a distributor using or updating a new product nomenclature, or a business owner consolidating sales for all branches for a given period. The primary advantages of using batch mode include the ability to process huge data sets and meet most traditional corporate analytics needs (business management, client and marketing expertise, decision-making support, etc.).

However, one of the limits of batch processing is the latency period which makes any real-time integration impossible. This constitutes a delicate problem for companies with the need to meet client demands on the spot, cases such as making a recommendation to an Internet user in the middle of a purchase (think Amazon), posting an ad on a website aimed at a specific Internet user within a matter of milliseconds, taking immediate stock of the variability of different elements in order to improve decision-making (such as weather or traffic conditions) or detecting fraud.

In the Hadoop ecosystem, a new solution to this problem has emerged: Spark, developed by the Apache Foundation, is now offering a synchronous integration mode (in near real-time), also referred to as "streaming". This multifunction analysis engine is well adapted to fast processing of large data sets and includes the same functions as MapReduce, albeit with vastly superior performance. Namely, it enables the management of both data acquisition and processing, all while offering a processing speed that is 50 to 100 times greater than that of MapReduce.

Today, Talend supports both of these integration modes (while making it possible to switch from one to the other in a transparent manner, whereas the majority of solutions on the market will require a total overhaul of the data integration layer). Not only does it simplify processing development, it also simplifies the management of the overall life cycle (updates, changes, re-use). In the face of increasing complexity when it comes to big data-related technological offerings, Talend strove to ensure its support of all Hadoop market distributions (especially the most recent versions), while masking their complexity through a simple and intuitive interface. Spark is now at the heart of Talend's batch & real-time integration offer.

What's more, Spark now features new functions, which, given the backdrop of real-time activities, provides companies with expanding options. One such example is the "machine learning" functions support, currently a native Spark feature. The primary advantage of machine learning is to improve processing based on learning. Combining batch and real-time processing to meet today's corporate needs is also just around the corner: setting up a processing chain using weekly (batch) sales figures to develop predictive functions supported by this information as well as speeding up decision-making in real-time mode in order to avoid missed opportunities that arise in real time.

The advantages are obvious for e-commerce (recommendation) sites, as well as for marketing in general: combining browsing history data with the very latest information from social networks. For banks, creation of a "data lake" where all market data (internal and external) are compiled with no volume restrictions can enable the development of a predictive program by integrating other types of data. In the banking industry, this solution also enables huge volumes of data containing pertinent information to be extracted in order to foresee several different scenarios (predictive maintenance).

At the end of the day, this implicates all business sectors, from agriculture to wholesale distribution, from service provision to digital service providers, from manufacturing to the public sector, and so on. The advent of this new type of tool gives companies unprecedented analytical potential and will assist in their alignment with the current reality of their business with greater accuracy. Talend is the only player in the big data arena to, on the one hand, offer a transformation solution and written data processing aimed specifically at capitalizing on both batch and real-time data integration functions, and on the other hand, to offer Big Data that integrates all of the traditional integration functions (Data Quality, MDM, Data Governance, etc.) addressing the needs of the biggest IT management firms for whom an Enterprise Ready solution is simply not an option.

Related Resources

With Talend, Speed Up Your Big Data Integration Projects

Products Mentioned

Talend Big Data

Share

Leave a comment

Add new comment

More information?
Image CAPTCHA
More information?