Latent F1 team and driver performance: fun with Bayesian models
bayesian
r
f1
stan
Published
December 27, 2024
For a bit of fun, let’s try and model the performance of F1 drivers and constructors in the 2024 season using Bayesian models. The aim is to more to learn about using Stan than determine deep truths about F1, since there’s far more to performance than we can easily model.
We’ll vaguely imitate the f1-metrics model, which I used to eagerly await after each season. In this model there are latent variables of driver skill and car quality, which predict points/positions. However I never did get hold of the paper for that, and am instead following the amazing Rasmus Bååth’s blog where he models the latent skill of football teams in La Liga.
I gathered CSVs of race data from the fantastic Ergast. The weapon of choice is R and the tidyverse, since nothing else is quite so ergonomic for this type of analysis. You can find the code behind the folds if you want to play along.
In the interests of being educational - and because Lord knows I hate editing - I am writing this post as I go, including any blind-alleys and debugging.
Let’s start with some visualisations of the data we have.
Figure 1: Histograms of driver performance in F1 2024.
The model goes like this. Points are simulated as a draw from a Poisson distribution parameterised by the driver performance and constructor performance.
We will therefore have a likelihood function that is the product of the Poisson density of the points for each driver for each race. The number of parameters is very high, as a performance score for each driver and for each constructor.
Constructor performance
Let’s start simpler, with a view of just the constructor performance. Assume that at least one driver for every constructor maximised the car’s performance each weekend. THIS ISN’T TRUE. If Albon had a poor weekend, Sargeant wasn’t going to step up, as one of many examples on the grid in 2024. However it is a useful approximation that let’s us analyse just the constructor performance.
Figure 2: Histogram of re-ranked constructor results in F1 2024.
A key point here is that we’ve recalculated positions using only the max driver’s position for each constructor, i.e. there are now only 10 finishing positions.
Again we can simulate the position \(p\) as a draw from a Poisson distribution, this time where \(\lambda_\text{ctr}\) is the performance of the constructor only.
\[
p \sim \text{Poisson}(\lambda_{\text{ctr}})
\]
To keep things simple, let’s start with an uninformative prior, a uniform distribution for the performance of the constructors.
\[
\lambda_{ctr} \sim \text{Uniform}(1, 10)
\]
Then our likelihood function is much simpler. (We won’t actually need this, but I find it useful to write out for the sake of understanding.)
Below the fold is the R code for interacting with it via CmdStanR. I settled on CmdStanR rather than RStan after hitting too many nameless runtime errors through RStan - see here.
Figure 3: Posterior samples for constructor performance
On this plot lower x-values are better.
The interesting thing is that this suggests the Red Bull was clearly the third-fastest car. I’m a little suspicious of that, because it claimed a sequence of 1-2s in the first third of the season. Let’s take a closer look at that.
The first thing to check is whether my dumb “max driver, 10 positions” model is problematic. Let’s look at Red Bull’s results.
NA strikes again! Forgot to add the crucial na.rm parameter to the min. Right, a do-over.
Code
ctr_positions <- f2024 |>group_by(round, constructorRef) |>summarise(# in the event of a double DNF, assign the last positionquali_position =pmin(min(quali_position, na.rm =TRUE), 10),position =pmin(min(position, na.rm =TRUE), 10) ) |>mutate(# re-rankquali_position =rank(quali_position),position =rank(position) ) |>arrange(round, position)ggplot(ctr_positions) +aes(x = position, fill = constructorRef) +facet_wrap(~constructorRef) +scale_fill_manual(values = ctr_colours$colour) +geom_histogram(alpha =0.5, colour ="black", linewidth =0.1, binwidth =1, show.legend =FALSE)
Figure 4: Constructor position histograms after fixing NA sorting.
That looks more like I’d expect. Let’s try Stan again.
Code
fit <-run_stan(make_data_list(ctr_positions))plot_posterior(fit)
Figure 5: Posterior samples after fixing NA sorting.
That is more intuitive: Red Bull is now much closer to McLaren and marginally ahead of Ferrari (despite coming third in the constructor’s championship). This tells us that yes, Verstappen probably would have won the driver’s championship in a McLaren - but contrary to claims, not in a Ferrari, since it wasn’t faster than the Red Bull overall.
The three teams that are on top of each other are: Alpine, Haas, and RB. You can also just about discern that the model is less sure about the top teams, since their posteriors are slightly wider.
There’s a lot we can improve about this model. The most obvious is that our priors for constructor performance are uniformative, and so our posteriors basically just reflect the data. Can we add some useful information to inform the model?
More informative priors
What makes some constructors better than others? Adrian Newey Money. Let’s reflect this in our priors.
This is a bit of a secret, so we have to estimate. We do know that the budget cap is 135M USD and it’s unlikely that anyone is operating below the cap. We also know that driver and top executive salaries are exempt from the cap. The top teams are spending about 100M USD on their drivers, and prior to the budget cap’s implementation likely invested in facilities that they can use for free (like wind tunnels).
With that in mind, I asked GPT-4o to search the web and estimate budgets for the teams. It came back with these results, which I sanity checked against this Blackbook Motorsport article.
Figure 6: Posterior samples from the model with budget-based priors.
This hasn’t made much difference. McLaren has eased back towards Red Bull because their budget is smaller. You could say they did a lot better than Red Bull relative to their budget.
This model suggests there’s very little nothing in it between Red Bull and McLaren, though the posterior distribution is wider for the top three teams than the others so there is more uncertainty.
Was budget a bad choice for the prior? Not necessarily: this model of constructor performance incorporates an important factor that isn’t clear from the results, but which should affect an objective assessment of which teams are better. Or to put it another way, if I were Max Verstappen I’d still rather be driving the Red Bull than the McLaren next year, because Red Bull’s budget suggests a more capable team overall. (Not to mention the increased wind tunnel time that third place in the constructors’ gets relative to the winner.)
Figure 7: Comparison of prior and posterior samples.
Plotting the priors and posteriors together shows how McLaren over-performed and Mercedes continued to underperform.
Past performance
Another, very easy way to set the priors is to consider the constructors’ performance over the previous seasons with the same regulations, in this case 2022 and 2023 (the ground-effect era). This is really easy to do - we would just take a weighted mean of the constructors’ finishing positions as mu for our priors, and perhaps set the variance proportional to the difference. Not particularly innovative or interesting though, so let’s skip it.
Car-limited circuits
Long-time F1 fans will know that at some tracks, the cars almost always finish in pairs, e.g. both Ferraris, then both Mercedes, and so on. Somehow at these circuits driver skill makes little difference; we could say the circuits are car-limited rather than driver-limited. That’s no fun to watch, but I wonder if it’s useful for establishing car performance?
Let’s look back at historic race results to see if some circuits are more likely to finish in two-by-two order. We can score this by the median difference in position between finishing drivers.
Code
circuits2024 <- races |>merge(circuits) |>filter(year ==2024) |>select(circuitRef)position_gaps <- f1data |>filter(year >2014) |># hybrid era onwards, to try and keep it relevant to modern F1 car sizesmerge(circuits2024) |># circuits from 2024 onlyselect(year, round, circuitRef, constructorRef, driverRef, position) |>filter(position !="\\N") |># ignore DNFsmutate(position =as.integer(position)) |>group_by(year, round, circuitRef, constructorRef) |>filter(n() ==2) |># must be a double finishsummarise(gap =max(position) -min(position), max_position =max(position)) |>group_by(circuitRef) |>mutate(median_gap =median(gap))ggplot(position_gaps) +aes(x = gap) +facet_wrap(vars(circuitRef)) +geom_histogram(fill ="dodgerblue1", alpha =0.5, colour ="black", linewidth =0.1, binwidth =1) +geom_vline(mapping =aes(xintercept = median_gap), colour ="dodgerblue4")
Figure 8: Distribution of finishing position gaps between teammates by circuit, with median line.
Perhaps unsurprisingly given that F1 isn’t a spec series, the gap between drivers of the same car is usually very low. I went back and forth on whether to discard cars that finished outside the top ten points-paying positions, as this often includes cars that were retired to save expense or were wounded but finished the race well below their potential for the sake of data gathering. In the end I kept them in because we need two-car finishes to calculate the statistics and setting a threshold would mean re-coding all the finish positions.
It turns out to make little difference anyway. The race with the highest probability of teammates finishing in consecutive positions is Suzuka, whether or not you apply a threshold.
Table 6: Results for Suzuka 2024 (two-car finishes only)
year
round
circuitRef
constructorRef
driverRef
quali_position
position
2024
4
suzuka
red_bull
max_verstappen
1
1
2024
4
suzuka
red_bull
perez
2
2
2024
4
suzuka
ferrari
sainz
4
3
2024
4
suzuka
ferrari
leclerc
8
4
2024
4
suzuka
mclaren
norris
3
5
2024
4
suzuka
aston_martin
alonso
5
6
2024
4
suzuka
mercedes
russell
9
7
2024
4
suzuka
mclaren
piastri
6
8
2024
4
suzuka
mercedes
hamilton
7
9
2024
4
suzuka
haas
hulkenberg
12
11
2024
4
suzuka
aston_martin
stroll
16
12
2024
4
suzuka
haas
kevin_magnussen
18
13
2024
4
suzuka
alpine
ocon
15
15
2024
4
suzuka
alpine
gasly
17
16
The cars are quite well in order here, if you ignore the gulf between Alonso and Stroll. It looks like Suzuka is really a car-limited track. Incidentally, this would suggest that in round 4, McLaren and Mercedes were quite evenly matched.
Counter-arguments
Are we sure Suzuka is a car-limited track?
Suzuka is a track “drivers love”! Surely it’s a more skill-dependent track?
I would have expected so! Suzuka is described as a technical track, with minimal room for error. Perhaps there aren’t as many potential lines that a driver could take to distinguish themselves. It’s also a good track for overtaking, so any driver that does qualify out of position is able to recover. Or perhaps the reason drivers love it is because they get to drive their cars on the limit: the limit of the car’s performance.
Either way, statistically drivers are more likely to end up in pairs in Suzuka than anywhere else.
Suzuka was an early round. Perhaps car performance converges over the season?
On the one hand we’d expect to see cars start at different levels but converge throughout the season, as the teams spot innovations on other cars that they can copy. Or secret design documents that they can copy, ahem. On the other hand, constructors in a tight battle would be motivated to develop their cars further - for example in 2022 when Red Bull overhauled an early Ferrari advantage.
As it happens, historically Suzuka has been mid-to-late season, which suggests that being an early round in 2024 isn’t the reason for being car-limited.
However the fact that it was an early round in 2024 does mean that car performance will have changed over the season. Suzuka can only ever be a point-in-time assessment.
Are we sure that consecutive finishes aren’t indicative of similar driver skill levels?
The top teams can also hire the best drivers, is it not possible that we’re still seeing the drivers finishing in skill order? I.e. it’s actually driver-limited, but the drivers happen to be mostly paired with similarly-fast drivers. Alonso and Stroll being the clear exception in the 2024 race results.
On the other hand, between 2023 and 2024 there were no line-up changes among the top 5 teams, so we can easily compare across the two years.
Table 8: Results for Suzuka 2024 (two-car finishes only)
year
round
circuitRef
constructorRef
driverRef
quali_position
position
2023
16
suzuka
mclaren
norris
3
2
2023
16
suzuka
mclaren
piastri
2
3
2023
16
suzuka
ferrari
leclerc
4
4
2023
16
suzuka
mercedes
hamilton
7
5
2023
16
suzuka
ferrari
sainz
6
6
2023
16
suzuka
mercedes
russell
8
7
2023
16
suzuka
alpine
ocon
14
9
2023
16
suzuka
alpine
gasly
12
10
2023
16
suzuka
alphatauri
lawson
11
11
2023
16
suzuka
alphatauri
tsunoda
9
12
2023
16
suzuka
haas
hulkenberg
18
14
2023
16
suzuka
haas
kevin_magnussen
15
15
The order of constructors is very different, despite all drivers being the same except for Alpha Tauri/RB. Just for fun, here’s a side-by-side comparison for each driver for 2023 and 2024.
Code
s24 <- suzuka2024 |>select(constructorRef, driverRef, position) |>rename(position2024 = position)s23 <- suzuka2023 |># correct to the new names for joining# (yes I could have used constructorId, but opaque IDs are so tedious to handle)mutate(constructorRef =ifelse(constructorRef =="alphatauri", "rb", constructorRef)) |>mutate(constructorRef =ifelse(constructorRef =="alfa", "sauber", constructorRef)) |>select(constructorRef, driverRef, position) |>rename(position2023 = position)merge(s23, s24, all =TRUE) |>arrange(position2023) |>kable()
Table 9: Drivers’ results for Suzuka in 2023 and 2024
constructorRef
driverRef
position2023
position2024
red_bull
max_verstappen
1
1
mclaren
norris
2
5
mclaren
piastri
3
8
ferrari
leclerc
4
4
mercedes
hamilton
5
9
ferrari
sainz
6
3
mercedes
russell
7
7
aston_martin
alonso
8
6
alpine
ocon
9
15
alpine
gasly
10
16
rb
lawson
11
NA
rb
tsunoda
12
10
sauber
zhou
13
NA
haas
hulkenberg
14
11
haas
kevin_magnussen
15
13
aston_martin
stroll
NA
12
rb
ricciardo
NA
NA
red_bull
perez
NA
2
sauber
bottas
NA
14
williams
albon
NA
NA
williams
sargeant
NA
17
Don’t certain tracks favour certain cars (e.g. due to having fewer slow corners)?
This is a valid objection. We can only really say that this reflects car performance at tracks like Suzuka. Suzuka is a high-speed, high-downforce track with more high-speed than low-speed corners. You can kind of tell by looking that only corners 2, 9, 11, and 14 are slow.
So perhaps another way of looking at it is that low-speed corners are where drivers make the most difference. That aligns with the excellent recent analysis by Mark Hughes where he explains how drivers like Leclerc and Hamilton make up their time in slow corners.
Naturally certain cars will also be better at slow corners, and that’s not represented at Suzuka. We might be able to gain a better understanding with sector times, but sadly I don’t have this data.
Suzuka-based priors
At this point I’m convinced enough that Suzuka is a useful indicator of relative car performance to be a prior, so let’s take the position of each team’s fastest car round Suzuka as constructor performance priors. Luckily there weren’t any double-DNFs - at least one car from each team finished - so every team is represented.
The Stan model is the same, we’re just injecting different prior mus and SDs.
Figure 9: Posterior samples for the Suzuka-position priors.
It’s not dramatically different. It shouldn’t be, the model and data are the same and the priors are very similar to the budget-based priors - the fun was in figuring out that Suzuka is car-limited.
Figure 10: Posterior samples alongside the Suzuka-position priors.
Most teams didn’t move far from their priors. I’d draw similar conclusions to before: the McLaren was fastest, Red Bull was a close second, and Ferrari were not far behind.
Table 10: Mean and SD from posterior sample, with Suzuka priors.
constructorRef
mu
sigma
red_bull
2.01
0.31
mclaren
2.05
0.32
ferrari
2.42
0.32
mercedes
3.39
0.39
aston_martin
6.07
0.49
rb
7.47
0.56
haas
7.77
0.58
alpine
8.55
0.64
sauber
9.08
0.52
williams
9.39
0.44
Reintroducing driver skill
Simple model achieved, let’s push on to something more complex i.e. the originally described model. Here it is:
data {
int<lower=1> n_ctrs;
int<lower=1> n_drvs;
int<lower=1> n_obs;
array[n_ctrs] real ctr_mus;
real<lower=0> ctr_sd;
array[n_obs] int<lower=1, upper=20> positions;
array[n_obs, 2] int position_indices;
}
parameters {
vector[n_ctrs] lambda_ctrs_raw;
vector[n_drvs] lambda_drvs_raw;
}
transformed parameters {
vector[n_ctrs] lambda_ctrs;
vector[n_drvs] lambda_drvs;
// Apply constraints to ensure identifiability
lambda_ctrs = log(20) * inv_logit(lambda_ctrs_raw);
lambda_drvs = log(20) * inv_logit(lambda_drvs_raw);
}
model {
lambda_ctrs_raw ~ normal(ctr_mus, ctr_sd);
lambda_drvs_raw ~ std_normal();
for (k in 1 : n_obs) {
int i = position_indices[k, 1];
int j = position_indices[k, 2];
positions[k] ~ poisson(exp(lambda_ctrs[i] + lambda_drvs[j]));
}
}
There is a crucial change. To ensure the model is identifiable, the raw skill parameters are constrained to [0, 1] with an inverse logistic transform. The transformed parameters are then summed and exponented to get the lambda for the Poisson distribution.
Table 11: Constructor performance means and SDs from the posterior.
constructorRef
mean_performance
sd
rank
red_bull
0.57
0.19
1
ferrari
0.70
0.21
2
mclaren
0.75
0.23
3
mercedes
0.95
0.26
4
aston_martin
1.18
0.36
5
haas
1.31
0.28
6
rb
1.34
0.34
7
alpine
1.55
0.31
8
williams
1.62
0.30
9
sauber
1.66
0.36
10
Note that the sampled parameter is now latent performance not positions, so the numbers don’t have a real-world meaning. What matters is relative performance, where a lower number is better.
The model suggests - controversially, but with appropriate uncertainty - that the Red Bull was the fastest car, followed by Ferrari and then the McLaren. However the relative positioning of the other teams seems about right.
Forgive me for not labelling all the drivers on this next plot, but there’s only so much plot fiddling I’m willing to do and it’s usually obvious which is which.
Table 12: Driver performance means and SDs from the posterior.
driverRef
mean_performance
sd
rank
max_verstappen
0.55
0.20
1
leclerc
0.68
0.22
2
norris
0.71
0.24
3
russell
0.78
0.26
4
colapinto
0.81
0.31
5
hamilton
0.83
0.26
6
sainz
0.86
0.22
7
piastri
0.88
0.24
8
gasly
0.89
0.32
9
albon
0.93
0.31
10
ocon
1.03
0.31
11
hulkenberg
1.06
0.29
12
zhou
1.06
0.36
13
bottas
1.08
0.36
14
alonso
1.11
0.36
15
bearman
1.12
0.29
16
tsunoda
1.13
0.34
17
doohan
1.16
0.37
18
sargeant
1.19
0.31
19
ricciardo
1.20
0.34
20
kevin_magnussen
1.25
0.29
21
lawson
1.26
0.35
22
stroll
1.36
0.36
23
perez
1.49
0.21
24
The model suggests that the top drivers performed similarly, with Verstappen fastest. Poor Perez comes dead last, because the model has decided that the Red Bull was the fastest car.
There are some obvious issues though. Alonso is considered only the 15th best performer out of 24, which is absurd. This could be because the Suzuka prior placed Aston Martin ahead of Mercedes whereas they actually finished the season well behind.
If the informative prior is so bad, why don’t we just use uninformative priors? If we do that the model will not be willing/able to distinguish the contribution from cars and drivers. We need a better prior.
There will also be some effect from the way I’ve simply excluded DNFs rather than penalising driver-fault DNFs.
Weaknesses of this model
To an extent I’m just role-playing being Dr Phillips here, so your number one takeaway should be: prefer the F1 Metrics model or the Bell et al. (2016) model, which is a much more sophisticated Bayesian model. If you are seriously interested in estimating driver performance, please check those out.
Nonetheless, there are some obvious weaknesses in this model that it’s worth highlighting.
Use of positions rather than points
Phillips makes a good argument for a non-linear reward function, for it punishes drivers more fairly for DNFs. That would be an improvement here.
No assignment of DNF blame
I’ve just discarded all that data, which flatters the DNF-prone drivers. Though I note that the drivers near the bottom of the performance rankings in my model were more DNF-prone in 2024 anyway, so perhaps there’s a correlation between low performance and DNFs, or those drivers had some races where they wounded the car but finished anyway.
Only a single season considered
With more data for each driver - across different cars - the model would be better able to distinguish driver and car performance. Note that it would be necessary to model each constructor by year because the car performance varies so much.
This suggests that we might benefit from having multiple representations of car performance within each year, e.g. splitting each team into A and B specs according to when they brought their largest upgrade. The challenge with that (apart from the tedious business of determining the break points) is that we’d have less data for each car and also there are perhaps several step changes in performance throughout the season.
Summing it all up
What did we learn?
How to use Stan to sample the posterior for a Bayesian model in R.
Very little about F1.
Perhaps point 2 is under-selling a little, since we did learn that Suzuka is quite good for assessing relative constructor performance, and the model turned out some reasonable results for car and driver performance. There are some major caveats of couse, but it is the kind of model we could build on. An obvious future step is to look at the other car-limited tracks - hopefully some that are at a different point in the season and have more slow corners than Suzuka - and make the prior a combination of results at these tracks. If I try that out in future I’ll be sure to post an update here.