Gooooaaaallll: Winning at the data game
Sometimes the lifestyles of the rich and famous aren’t as glamorous as they seem at first glance. We all know that professional athletes can make incredible amounts of money. But by the age of 35, most pro athletes are already at the end of their prime earning years. Historically, a lot of them haven’t managed their money well — and they may even go bankrupt in retirement.
CRED Investments saw an opportunity to change this, starting with the soccer industry. The company works with young athletes playing in the top six leagues in Europe, and provides them with capital to start investing toward retirement. In exchange, the company takes a percentage of the player’s career-related income, including wages from the field as well as any off-field income such as endorsements.
“The better they do in their career, the better we do,” says Cherry Shah, Director of Engineering at CRED. “We provide a financial service, but we’re very much a data company. The reason we're able to provide money to these athletes is that data makes it possible to predict their future.”
For example, say you have a young player, 21 years old, playing for Manchester United — one of the top soccer clubs in the league. How can you predict how much he’s going to make throughout his career? You have to consider all the other clubs he might play for, any injuries he could face, what endorsement opportunities might come his way, and when he is likely to retire. All these factors help CRED Investments establish a projected cash flow for the player’s entire career. That requires a lot of data.
For CRED, the magic of predicting the future operates in three stages: data collection, data science, and data visualization. “You need a data warehouse if you’re going to apply data science between collecting the data and building the data visualization,” explains Shah. “Once we realized we needed to move the data from Postgres to BigQuery, we knew we needed an ETL solution — ideally one that was very easy to use. We didn't want to code it. We wanted maintenance to be simple enough that anyone could do it.”
Stitch made it easy for the company to establish that data warehouse as a single source of truth. “Thanks to Stitch, everything's very normalized,” says Shah. “There are no duplicates, no suspicious calculations. It's very fact-based. Now we can make the data science projects work better and make them easier to maintain.”
Even better, with the Stitch Unlimited plan, CRED was able to process the massive volume of data required to fuel their business — approaching one billion rows per month.
Data collection, data science, and data visualization
Currently, the company is collecting data from numerous paid and free data sources, both private and public. “We use APIs, we collect .csv data exports, and we also do a lot of our own scraping,” says Shah. “We collect data on every kick that happens in every game. Every wage that's earned by a player. Every win recorded by a manager. Every revenue that's earned by a team. Every broadcasting deal that’s signed by a league. Every transfer that is arranged by agents. Every financial metric for sponsors. Every indicator used to measure a country’s economic health. We even collect social media data about all these entities.”
The key to prediction is having historical data that they can use to train machine learning models, so the company collects data as far back as they can go. Shah continues, “We have a big back-end team basically focused on data collection. Every day — sometimes hourly — we're collecting the latest data available. Then Stitch helps us move all this to our BigQuery data warehouse.”
The company isn’t dedicated to collecting data out of simple curiosity — they need all those sources to fuel their analytical models. As Shah explains, “Wages, for example, are particularly hard to get — and to get accurate data on. So, you have to establish multiple different sources for wage data and then rely on the data science team to figure out which one is the most accurate for each player.”
The data science team uses all the data the company has collected to build complex machine learning models. These models help predict all the different factors that feed into a player’s projected cash flow. But they also help identify which sponsors would be a good fit for the player, based on their social media followers, demographics, geographies, and a variety of other factors.
To help inform the business and sales teams, CRED Investments feed the data into their data visualization tool, Looker. Prior to establishing the BigQuery data warehouse and the Stitch workflow, this process was painfully slow. “It was sometimes taking more than 10 seconds to load dashboards, which is too long in today's world,” recalls Shah. “If it took 10 seconds to load a web page like Facebook…well, nobody would be using Facebook.”
Shah had previous experience with Stitch, so he knew it would work at CRED. “I think we definitely read through all the documentation for several solutions, but we didn't implement any other ones. We tried Stitch first and we liked it.”
Right now, the company primarily uses Stitch to process databases into the BigQuery data warehouse, but that’s just the beginning. Shah reports that the company has big plans for Stitch in the future: “Speed was obviously the initial motivation for using it — that and the fact that it’s a good software architecture practice to have a data warehouse. Stitch helps us follow best practices in terms of coding and software engineering. But you also have integrations with different sources, like Google Analytics, Branch, Stripe, and Slack. We’re in the process of adding that functionality.”
Above all, Shah advises other entrepreneurs to always think long-term. “When you’re initially building your architecture and tool set, just think where you could be eventually and try to plan for that. You don't need to implement everything today, but think in terms of scale and dream big. I know Stitch is going to be key to our kind of growth for us moving forward.”