What the NFL Still Needs to Learn about Big Data
Praying for next season will continue until the league gets better at data integration.
Nobody loves stats and data more than football fans. From yards-after-catch (YAC) to possible correlations between the NFC winning the Super Bowl and a Republican winning the White House, rabid fans follow every conceivable story the numbers might tell.
There is no organization more interested in big data analytics than the NFL, a $13 billion juggernaut where data is directly tied to profits. That’s why the NFL now applies big data analytics to every aspect of… well, everything. Not only to current players and plays, but also the talent assessment process. Did you catch any of NFL draft last week? It didn’t take long before the choices left everyone from the NFL Network broadcast team, to the fans speechless.
So how are players evaluated and ranked? There are innumerable factors that teams consider, but here are just a few of the metrics that are collected and monitored for selecting players:
- Game stats: At least three games are taped for every prospect, analyzed by scouts, and broken down into more than a hundred statistical categories tailored to that player’s position.
- Trait-based analysis: translating a player’s more subjective qualities, such as processing speed, pattern recognition, leadership on the field, etc. into metrics.
- Cross-checking: applying multiple human evaluations (also known as “the eyeball test”) to the metrics.
- Grading: comparing the stats and trends of prospects to other prospects in the draft.
- Scheme fit: evaluating how the measured skills and grades would translate from college to the NFL.
Experts have even designed a system of numerical value for draft pick so that teams can analyze draft day deals in terms of numbers rather than intangible value. Teams also take into account whether a player sat out their bowl game before the draft. The list goes on—but you get the idea.
However, even with all this data at team’s fingertips, all too often the analysis doesn’t translate into on-field NFL production. For example, back in 2014 after all the previously mentioned analysis, the Cleveland Browns still drafted Johnny Manziel in the first round. How’d that work out? Even casual sports fans could rattle off a dozen high-profile NFL draft busts over the past couple of decades; Ryan Leaf, JaMarcus Russell, Tim Couch.
Mel Kiper’s Big Board from last year is already littered with 2016 bitter disappointments. In fact, about half of first-round quarterback picks are considered by many to be busts, as are the majority of running backs drafted in the first and second rounds. With each bad pick—even in the later rounds—amounting to multi-million-dollar mistakes you’d think teams would be doing everything they can to minimize the chance of failure.
So what’s the real problem? Don’t get me wrong, there is always the human element that data can’t protect us from.
So if the problem wasn’t the elephant in the room what was it? Since we’re a big data organization, I bet you’re expecting me to say better big data solutions are the answer. We’re a hammer, and everything’s a nail—right? Not quite. But getting better at one aspect of big data analytics could really help NFL teams pick the right players on draft day: better data integration.
Data integration is all about making better use of all the data you have, not just the select pieces that work with the analytics solution you happen to be using. Many of today’s analytics solutions utilize only internal structured data. This severely limits the view of the bigger picture, because unstructured data from non traditional data sources are becoming more prevalent and growing faster than structured data. There has been some excellent work on native JSON ingestion though so things are getting better!
These unstructured data sources include everything from real-time video used to analyze the acceleration of a running back making sharp cuts to social media posts that might indicate a player’s propensities for visiting bars at 2:00 a.m.
How can you gain meaningful insights if you can’t connect to all this data and deliver it to your analytics platform?
Even with all the data collected, if it takes forever to run a query, you’re going to be up a creek without a paddle. That report better take less than 10 minutes to generate or you’re going to end up on ESPN the next day.
On the other hand, if you’ve got a massively parallel environment that can ingest, process, enrich, and cleanse data to leverage at the power and scale of Apache Spark, or if you can automate processes to accelerate results, then you’ve got a competitive advantage.
Better integration isn’t just an urgent imperative for the NFL draft—it also applies to all kinds of other important decisions.
Let’s take foreign aid, for example. Right now the U.S. aid organizations use analytics to help guide decisions about what, where, and how much to invest in aid programs—but they have a high miss rate, or investments with no demonstrable payoff.
Hospitals use analytics to assist with everything from diagnosis to treatment options— while they’ve made leaps and bounds in using big data strategies, they’re far from reliable and still require a large percentage of human intervention.
Industrial machine makers use big data to try to predict failures and outages so they can proactively prevent downtime and save time and money—but these strategies are not widely utilized.
The key to higher success rates in all these use cases is more effective use of data integration and an outcomes-based approach to analytics queries.
In our age of hyper-convergence, it’s time for data to converge too, so that we can use all of it to make better decisions in less time. Until then, we’re just going to have to continue scratching our heads and saying… “Blaine Gabbert in the first round?!? Are you freaking kidding me?!?”