A common problem in recruitment and player evaluation is in comparing players at different stages of their career or who have had different career pathways.
While an obviously extreme example, consider Marcus Rashford and Wayne Rooney at the start of this season. One was an 18-year-old striker with only 11 league appearances and the other a thirty-old striker with well over 400 league appearances.
In this blog I use a relatively simple statistic – conversion percentage – as a template to understand how we can alter our assumptions to better inform our understanding of player evaluation.
Our approach must consider not only a player’s past performance but also the differing levels of uncertainty that surrounds our projections about future performance.
We are now at a phase within football analytics where more advanced metrics such as Expected Goals should feature when evaluating forward players, however conversion rate can act as a good starting point, knowing that there are certain players who consistently display an above average conversion rate.
So how do we pick these players out from the pack?
Starting again: back to basics
I decided to follow an approach used by David Robinson to estimate baseball batting averages. I’ll outline this from the beginning through to the end, and the underlining processes behind this tweaked way of thinking will become clear as this article progresses.
First I looked at all conversion percentages for every player in the Premier League, La Liga, Bundesliga, Serie A and Ligue 1 from 2005-06 to 2015-16. A simple table reveals the best conversion percentages during this period.
Despite their seemingly excellent conversion rates, we can safely say that Marcel Ziemer, with 37 minutes of Bundesliga football in his career, and goalkeeper Tim Howard are probably not among the best finishers in these competitions.
The problem here is obvious: none of these players have taken enough shots, so their conversion percentages are skewed by the small sample. So what happens if we put a shots filter on our table, and only look at players who have taken more than 30 shots?
The list now looks a little better, but the highest conversion percentage among all players is still a right back in Paul Verhaegh, and although a few names start to make a bit more sense with strikers Bas Dost and Dario Cvitanich, it still feels wholly unsatisfactory.
A new diagnosis…
Taking a step back we can start to diagnose the problem and use a more informed approach. In general about one in every 11 shots is scored, so we can start by assuming every player converts at this average rate. The average conversion percentage for players with 30 or more shots in this sample is 9.12%.
It is clear that on average the more shots a player takes, the closer his conversion percentage gets to the average (represented by the red line), whereas there is a much higher variance amongst players who have taken fewer shots. This statistical phenomenon called “regression to the mean” suggests that even if a player has a super high conversion percentage in a small sample – like Paul Verhaegh – we would expect that to revert to somewhere closer to the average over time.
However, this can appear to over-simplify events and outcomes as we know other factors can play a role. To adjust for this phenomenon we can use what is called an Empirical Bayesian approach. Constantinos Chappas used a similar approach at the OptaPro Analytics Forum to look at year-to-year uncertainty with similar aggregated metrics.
Adjusting the start position
We start with a prior belief – in this case that every player begins with a conversion percentage of 9.12% – and every new data point (in this case a shot and its outcome) shifts our estimate away from the starting point.
Therefore if a player takes one shot and scores their estimated conversion percentage won’t immediately be 100% as shown in the earlier cases. Now their estimated conversion percentage will only shift a small amount based on how much this changed our prior view that his conversion percentage was 9.12% (this is detailed more in the mathematical appendix).
We get a new leaderboard using this approach.
The featured players now start to look more reasonable. Amongst the top five we have Javier Hernandez, who has had an impressive scoring rate at both Manchester United and Bayer Leverkusen, as well as Carlos Bacca who has been similarly clinical at Sevilla and AC Milan. However, there is still reason to be sceptical; right back Paul Verhaegh remains in the top five despite only taking 51 shots.
This brings us to the second trend in the chart. This isn’t so much of a classical statistical trend as one based in an understanding of the game itself: players with more shots tend to have a higher than average conversion percentage.
The blue trend line makes this clear, but even just looking at players with more than 500 shots in their careers, almost all of them are above the conversion percentage sample average of 9.12%.
So why does this trend emerge? Essentially good shooters shoot more. This may be down to good shooters getting more minutes, good shooters trusting themselves to shoot more often, good shooters trying to carve out shot opportunities more frequently, teammates looking to pass the ball to good shooters more frequently or a variety of other factors.
Shifting our beliefs
From an estimation point of view this means that assuming all players have a conversion ratio of 9.12% before they have taken a single shot is flawed because good shooters who taken more shots and have a higher conversion percentage inflate this average.
By acknowledging the number of shots a player has already taken in their career we can look to address this problem (known as a Beta-Binomial regression). Within this new approach, we consider that good shooters will shoot more often.
Finally the leaderboard really starts to pass the eye test with the top five comprising some of the most clinical forwards of the past decade.
This is just one of many ways to look at finishing ability. This can also be applied to other statistics and metrics, and can be applied to better inform how we benchmark and evaluate players.
However, this particular example shows how we can use fairly simple aggregate data – in this case essentially only goal and shot numbers – to gain meaningful insights and identify key player skill sets amongst a large and varied sample of players.
We’ll finally revisit our Rashford – Rooney example. Using this method Rashford has an estimated conversion rate of 11.31% and Rooney has an expected conversion rate of 13.24% (the same as his actual conversion percentage in this time frame). So, despite scoring five goals from 16 shots in his first Premier League season – a conversion rate of 31.25% – this estimated conversion percentage adjusts for the fact he’s taken far fewer shots than Rooney and doesn’t allow us to draw the perhaps ill-informed conclusion that Rashford is already a better finisher than Rooney.