How Apache Spark™ Feeds Real-Time Sports Analytics


With the Euro 2016 tournament now drawing to a close, and having two kids of my own in a competitive soccer league, even a French husband; I now live and breathe football (or soccer depending on your frame of reference) almost every single day.

Soccer, much like today’s businesses, has begun to embrace the world of big data. The key method of judging how well a player or team will perform throughout the course of a game or even throughout the season is beginning to shift from the organic, ‘gut’ instinct based on analysis gathered by simply watching a game— to far more involved levels of analyses based on game plans driven by algorithms, formulas, figures and data.

Within modern stadiums and practice fields around the world, professional players are not only monitored by video cameras, but also by all sorts of accessories such as accelerometers, heart rate sensors and even GPS-like location systems or RFID chips. The reason? Using Big Data to evaluate various conditions, challenges, player fitness, performance, etc. has been proven time and again to  the provide an incredible edge over the competition. The 2016 Euro Cup is yet another opportunity to pay special attention to the impact big data is having not only on the world of European football, but also on the sporting industry at large.

Smart sensors, motion tracking cameras and real-time big data technologies like Apache Spark could make arguing with the referee over a missed call or game planning based on gut instinct alone a thing of the past.

Technology Gets a Starting Spot

Today’s real-time big data technologies should be front and center in the world of sports because it allows coaches, athletes, teams and owners alike to all benefit not only through improved performance and notoriety on the world stage, but also from profits and long-term success.  However, the benefits of big data also extend to fans by allowing them to partake and make use of the latest statistics to improve their ‘fantasy football league’ rankings or betting scenarios. In short, the use of big data in sport enables:

·   Precision training (just ask the German national team (winners of the 2014 FIFA world cup how IoT data from their Adidas miCoach helped them)

·   Maximized player performance on the field

·   Game planning against opponent’s tendencies, trends and weaknesses

·   Injury prevention

Additionally, perhaps in the near future, we will see Big Data also deliver:

·   Help with offside calls, where the line referee has to have in check on two players at the same time in (the technology is already used for the goal line)

·   Drone-enabled assistance for referees to ensure accuracy.

It’s both an exciting and disruptive time for sports worldwide—particularly with the 2016 Rio Olympic games just around the corner. So what are the technologies that are making this all possible? Let’s take a look behind the scenes at one particular real-time player analysis tool.

Real-Time Player Data at the Speed of Spark

Recently, Talend was asked by a customer to put together a technology stack that would be able to leverage video camera images to track player statistics, such as speed and distance traveled, player stamina, distance from the player to the goal, etc. on a soccer field to generate predictive analysis of all possible game outcomes.

The technologies used for this demonstration include:

  1. Data streams from a video camera tracking system which tracks all movement on the field.  The ball as well as all the players of each team.
  2. Talend Real-Time Big Data integration software
  3. MySQL database used to store the aggregated results of the Talend processing and then used for the fast data layer on visualization.
  4. Kafka as the queuing technology in which the data captured by the cameras are being streamed to and then read by Talend’s Real-Time Big Data Integration software.
  5. Apache Spark Streaming for doing the processing of the data and jobs from Talend
  6. Talend REST services to pull the aggregated data from MySQL and server up to the visualization layer
  7. Data visualization tool for displaying in real time the results of all the players speed and distance calculated from the real time streaming data from the cameras.

Our goal was to aggregate the speed and distance of each player within the stadium in real time. To accomplish that, 24 separate cameras were placed around the stadium to capture each player’s and ball position every second of the game.

Real-time sports analytics scenario overview


Football’s (Soccer) fastest players according to player data


The camera array sends a feed of 25 frames per second, each frame captures the x, y, z coordinates of every player and the ball. This real-time data capture generates a huge amount of information to analyze in a short amount of time, upwards to 400- 500 Gigabits per match.

The data flow begins with streaming data from the camera as a JSON format to the Kafka messaging broker and creating Talend jobs that run in the Spark streaming mode to process and transform data in real-time and output the resulted data in the MySQL storage which is queried by a Talend REST service and served up to a web application.  This all sound pretty complex but with Talend you will see it is actually a quite straightforward process to build and deploy to the Spark Cluster and to deploy the REST service.

What Can Your Team Do with Big Data?

It’s no doubt that utilizing big data analytics from projects like the one described above can provide a huge competitive advantage to sports clubs. Just imagine if your coach or general manager was able to use data to answer these questions:

Team stats

  • Who is the most likely to score next? Which team is more likely to win?
  • How do opposing teams tactically position their players throughout the game?

Individual player stats

  • Which player needs a rest?
  • Which player is at risk of overuse, injury?
  • Which players are on a downward trend and could be traded? (this question brings in historical data)

With real-time big data, sports teams around the world are primed to reap the multi-layered benefits of becoming truly data-driven. To learn more about Talend’s real-time big data capabilities start here, and find out how real-time big data is becoming the new reality.



Leave a Reply