Sloan Recap: Moneyball, Machine Learning, and Large Language Models

Since 2011, I have been attending the MIT Sloan Sports Analytics Conference in Boston (including the 2021 remote edition) and have always enjoyed attending, mostly catching up with friends in the industry but also hearing and discussing (and sometimes presenting) new innovations in the sports analytics space. This year was no different, with many entertaining panels, interesting talks and research papers. But something was a bit different this year, with the uncertainty and hesitancy around how AI will impact the sports industry and how it might take human jobs. Given that it is now a week since the MIT SSAC conference, I have been able to reflect and summarise my 5 key takeaways from the conference, with the last 2 referring to this issue (which was also the topic of my presentation and panel that I was on, but given that is what I spent most of the conference discussing, I will spend most of the detail around those two topics).

20 Years on from Moneyball: It is hard to believe, but it has been 20 years since Michael Lewis’s “Moneyball” was released. To commemorate the occasion, there was a panel featuring Michael Lewis, along with Shane Battier, Bill James, and Daryl Morey, and it was moderated by Jackie McCullum – and it was a very entertaining panel. The key theme was (and I think this often gets lost of people), but Moneyball isn’t just a sports story; it uses sport as an example (specifically the Oakland A’s in baseball) on how using data and analytics as an assistive tool to measure process and value resources can optimise how businesses can run – this can be a massive competitive advantage if your competitors aren’t doing the same.
The Best Investment is in Women’s Sport: As highlighted by numerous people across many panels, the best investment currently is in women’s sport. This view stems from the rise in popularity, franchise valuations and broadcast rights for the WNBA, Women’s soccer leagues in Europe, and the recent launch of the T20 Cricket Women’s Premier League in India. Additionally, we have found that women’s sport is a great way to showcase new things we have done with our great WTA partners and numerous soccer, basketball and cricket competitions.
Live Player Props: The rise in betting in the US was also a prominent topic at the conference. In addition to the logistics and hurdles of getting states on board with gambling, another key rhetoric was around the need for live player props and the need for low latent data and models to update. From our perspective, this is something we have heard globally (just not in the US) and are already leading the charge to deliver. Our Betting Innovation Centre partnership with Sporting Solutions is a recent example. Watch more in this space from us…
ChatGPT and How it Applies to Sport: Over the last three months, with the introduction of ChatGPT, a lot of press and interest has centred on the use of Generative AI and Large Language Models (see our two articles here for a deep dive: Part 1 & Part 2). I gave a talk on this topic on Friday afternoon at the conference. The key points that I tried to convey were:
1. Current Large Language Models (LLMs) such as ChatGPT hallucinate facts, which is very problematic in sport,
2. To enable chatbots in sport, you need to utilise a facts-first approach and utilise sports data that is live, trusted and across all sports (like we do at Stats Perform),
3. The language of sport is not natural language text like that used in large language models – it is its own language (sports text which consists of stats like shots, tackles, and passes as well as the visual mode – using positional data showing player locations and movement),
4. Using the visual language, we can expand and scale the language of sport to find new patterns that help teams and media analyse and tell better stories. Great examples are our new Opta Vision soccer metrics like line-breaking passes and pressure (part of the Opta Vision set), which reveal previously unseeable layers of detail that make games more compelling, help teams find hidden player strengths, and make better predictions around tactics and strategy.
5. The future of sports analytics is to utilise this data and form large language models (both using the derived discrete statistics and visual language models), which can be used for even more versatile and comprehensive predictions across sport, to help teams make better decisions pre-game and in-game, and make media coverage even more compelling and available, driving fan growth and attention.

I received great feedback from the presentation. Formalising sport as its own language and creating language models on these specific datasets seemed to resonate really well. Also, this trend of using language models was in a couple of the research papers, which mirrors the approaches that we at Stats Perform have been taking in generating our new products as mentioned above over the last couple of years.

Concern about the “Rise of the Machines”: The GPT discussion led to more interesting conversations on the Saturday afternoon panel I was on, where we were asked to address any potential adverse consequences and limitations of AI in sport. The key points that I raised were:

1. Data for a lot of the world is not digitised or available. This is also heavily true in sport. Whilst positional data for every player, on and off the ball greatly expands our capacity to analyse match situations, as well as make predictions, there are many other factors which will never feed into our models. That includes private data like health data, sleep data, nutrition data, even whether two players on the same team dislike each other that day. This could be seen as a limitation. I see it as a fundamental part of the beauty of sport – well, human sport at least. Sport analytics is getting to a mature state of capturing and analysing within-match performance – especially with the recent advancements. However, there are natural and imposed guardrails in place, such as restricting the availability of private data through CBA’s and other regulations, which will ensure that a human will have oversight on the final decision (like a pilot on a plane – they can rely on auto-pilot for the most part but can intervene when needed but will always be needed on the plane). We believe that is the sweet spot of AI technology – to create assistive tools to help human domain experts do their jobs better. I can’t see anything changing that anytime soon.
2. I firmly believe that the opportunities created by AI in sport firmly outweigh any human capital cost. If “machines are rising,” it is to either do highly repetitive and time-consuming jobs, or help us scale analytical outputs, but if anything, that is creating more scope for human endeavour. For example, AI helps us spot potential data collection anomalies earlier, meaning in-game live stats are more accurate, meaning our analysts can confidently collect more of them for more games. This means we can power new stories. As such AI is another tool in the toolbox that empowers our clients and us to do more and, ultimately, to make sport even more captivating. There are still a great many untold tales, but AI is helping us enable more of them to be told. See my earlier comment about Women’s sport for example.
3. Trust, Reliability and AI Security: However, with AI technology getting advanced, we need to be mindful of where our sports data comes from and if it is up to date (which I highlighted with the ChatGPT example). Also, with the rise of deep fakes for voice and video – the need to verify the authenticity of all sports data is needed. An example that I used on the panel was to imagine if someone used “deep fake” technology to generate highlights of their performance using historical footage of Giannis or other emerging stars in basketball. One strategy is to have “zero trust,” ensuring that a person is there to verify if that person is actually the person of interest. An alternative is to utilise a trusted data and analytics provider where they do that themselves. This is already underway in areas outside of sport, but the field of AI Security is becoming a must-have in this AI world, where it can be used to verify if a piece of content is real or generated. Hence, why knowing where and how all your data (and AI outputs on top of that data) are created, and trusting that source, is going to be a central pillar when devising a data and AI governance strategy.

Overall, the conference was a lot of fun to attend. After years of social distancing and video conferencing – being able to meet as a community at one place and time was a thoroughly enjoyable experience. To that end, we have our Opta Forum coming up shortly in London, which I can’t wait for based on the speakers, research papers and innovations that will be showcased. We hope to see everyone there!

Dr. Patrick Lucey is the Chief Scientist at sports data giants Stats Perform, leading the AI team with the goal of maximizing the value of the company’s deep treasure troves of sports data. Patrick has studied and worked in the AI field for the past 20 years, holding research positions at Disney Research and the Robotics Institute at Carnegie Mellon University as well as spending time at IBM’s T.J. Watson Research Center while pursuing his Ph.D. Patrick hails from Australia where he received his BEng(EE) from the University of Southern Queensland and his doctorate from Queensland University of Technology. He has authored more than 100 peer reviewed papers and has been a co-author on papers in the MIT Sloan Best Research Paper Track, winning best paper in 2016 and runner-up in 2017 and 2018.

Sloan Recap: Moneyball, Machine Learning, and Large Language Models

We Also Recommend

Sign up to The Scoreboard