Latent F1 team and driver performance: fun with Bayesian models

bayesian
r
f1
stan
Published

December 27, 2024

For a bit of fun, let’s try and model the performance of F1 drivers and constructors in the 2024 season using Bayesian models. The aim is to more to learn about using Stan than determine deep truths about F1, since there’s far more to performance than we can easily model.

We’ll vaguely imitate the f1-metrics model, which I used to eagerly await after each season. In this model there are latent variables of driver skill and car quality, which predict points/positions. However I never did get hold of the paper for that, and am instead following the amazing Rasmus Bååth’s blog where he models the latent skill of football teams in La Liga.

I gathered CSVs of race data from the fantastic Ergast. The weapon of choice is R and the tidyverse, since nothing else is quite so ergonomic for this type of analysis. You can find the code behind the folds if you want to play along.

Code
library(tidyverse)
library(knitr) # this lib is for rendering tables nicely

qualifying <- read_csv("data/qualifying.csv") |> rename(quali_position = position)
races <- read_csv("data/races.csv") |>
  select(raceId, year, round, circuitId, name) |>
  rename(gp = name)
results <- read_csv("data/results.csv") |> select(resultId, raceId, driverId, constructorId, grid, position, points, milliseconds)
constructors <- read_csv("data/constructors.csv") |> select(constructorId, constructorRef)
drivers <- read_csv("data/drivers.csv") |> select(driverId, driverRef)
circuits <- read_csv("data/circuits.csv") |> select(circuitId, circuitRef)

ctr_drvs <- tribble(
  ~constructorRef, ~driverRef,
  "alpine", "doohan",
  "alpine", "gasly",
  "alpine", "ocon",
  "aston_martin", "alonso",
  "aston_martin", "stroll",
  "ferrari", "bearman",
  "ferrari", "leclerc",
  "ferrari", "sainz",
  "haas", "bearman",
  "haas", "hulkenberg",
  "haas", "kevin_magnussen",
  "mclaren", "norris",
  "mclaren", "piastri",
  "mercedes", "hamilton",
  "mercedes", "russell",
  "rb", "lawson",
  "rb", "ricciardo",
  "rb", "tsunoda",
  "red_bull", "max_verstappen",
  "red_bull", "perez",
  "sauber", "bottas",
  "sauber", "zhou",
  "williams", "albon",
  "williams", "colapinto",
  "williams", "sargeant",
)

ctr_colours <- tribble(
  ~constructorRef, ~colour,
  "red_bull", "#3671c6",
  "mercedes", "#26f5d2",
  "ferrari", "#e80220",
  "mclaren", "#ff8001",
  "aston_martin", "#229971",
  "alpine", "#F363B9", # Should be "#0093cc" but there's too much blue
  "williams", "#63c4ff",
  "rb", "#6592ff",
  "sauber", "#52e252",
  "haas", "#b6babd"
) |>
  arrange(constructorRef)

drv_colours <- ctr_colours |>
  merge(ctr_drvs) |>
  group_by(driverRef) |>
  arrange(constructorRef) |>
  slice_head(n = 1) |> # alphabetical first for multi-team drivers i.e. Bearman
  select(driverRef, colour) |>
  arrange(driverRef)

f1data <- races |>
  merge(results) |>
  merge(qualifying) |>
  merge(constructors) |>
  merge(drivers) |>
  merge(circuits) |>
  select(raceId, gp, year, round, circuitRef, constructorRef, driverRef, q1, q2, q3, quali_position, grid, position) |>
  mutate(position = ifelse(position == "\\N", NA, as.numeric(position)))

In the interests of being educational - and because Lord knows I hate editing - I am writing this post as I go, including any blind-alleys and debugging.

Let’s start with some visualisations of the data we have.

Code
f2024 <- f1data |> filter(year == 2024)

ggplot(f2024) +
  aes(x = position, fill = driverRef) +
  facet_wrap(~driverRef) +
  scale_fill_manual(values = drv_colours$colour) +
  geom_histogram(alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1, show.legend = FALSE)
Figure 1: Histograms of driver performance in F1 2024.

The model goes like this. Points are simulated as a draw from a Poisson distribution parameterised by the driver performance and constructor performance.

\[ \text{points} \sim \text{Poisson}(e^{\text{perf}_{\text{drv}} + \text{perf}_{\text{ctr}}}) \]

Driver performance is simulated as a draw from a normal distribution.

\[ \text{perf}_{\text{drv}} \sim \text{Normal}(\mu_{\text{drv}}, \sigma_{\text{drv}}^2) \]

Likewise constructor performance is simulated as a draw from a normal distribution.

\[ \text{perf}_{\text{ctr}} \sim \text{Normal}(\mu_{\text{ctr}}, \sigma_{\text{ctr}}^2) \]

We will therefore have a likelihood function that is the product of the Poisson density of the points for each driver for each race. The number of parameters is very high, as a performance score for each driver and for each constructor.

Constructor performance

Let’s start simpler, with a view of just the constructor performance. Assume that at least one driver for every constructor maximised the car’s performance each weekend. THIS ISN’T TRUE. If Albon had a poor weekend, Sargeant wasn’t going to step up, as one of many examples on the grid in 2024. However it is a useful approximation that let’s us analyse just the constructor performance.

Code
ctr_positions <- f2024 |>
  group_by(round, constructorRef) |>
  summarise(quali_position = min(quali_position), position = min(position)) |>
  mutate(
    # re-rank
    quali_position = rank(quali_position),
    position = rank(position)
  ) |>
  arrange(round, position)

ggplot(ctr_positions) +
  aes(x = position, fill = constructorRef) +
  facet_wrap(~constructorRef) +
  scale_fill_manual(values = ctr_colours$colour) +
  geom_histogram(alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1, show.legend = FALSE)
Figure 2: Histogram of re-ranked constructor results in F1 2024.

A key point here is that we’ve recalculated positions using only the max driver’s position for each constructor, i.e. there are now only 10 finishing positions.

Again we can simulate the position \(p\) as a draw from a Poisson distribution, this time where \(\lambda_\text{ctr}\) is the performance of the constructor only.

\[ p \sim \text{Poisson}(\lambda_{\text{ctr}}) \]

To keep things simple, let’s start with an uninformative prior, a uniform distribution for the performance of the constructors.

\[ \lambda_{ctr} \sim \text{Uniform}(1, 10) \]

Then our likelihood function is much simpler. (We won’t actually need this, but I find it useful to write out for the sake of understanding.)

Code
log_lik_ctr <- function(ctr_positions, ctr_perfs) {
  ctrs <- sort(unique(ctr_positions$constructorRef))
  log_lik <- 1.0
  for (i in 1:length(ctrs)) {
    race_posns <- filter(ctr_positions, constructorRef == ctrs[i])$position
    log_lik <- log_lik + sum(dpois(race_posns, lambda = ctr_perfs[i], log = TRUE))
  }
  log_lik
}

Let’s throw Stan at this now. Here’s our Stan model file.

data {
  int<lower=1> n_ctrs;
  int<lower=1> n_obs;
  array[n_obs] int<lower=1, upper=10> ctrs;
  array[n_obs] int<lower=1, upper=10> positions;
}
parameters {
  array[n_ctrs] real<lower=1, upper=10> lambda;
}
model {
  lambda ~ uniform(1, 10);
  positions ~ poisson(lambda[ctrs]) T[1, 10];

  // the above "distribution" syntax is equivalent to:
  //target += uniform_lpdf(lambda | 1, 10);
  //target += poisson_lpmf(positions | lambda[ctrs]);
}

Below the fold is the R code for interacting with it via CmdStanR. I settled on CmdStanR rather than RStan after hitting too many nameless runtime errors through RStan - see here.

Code
library(cmdstanr, quietly = TRUE)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)

ctrs <- sort(unique(ctr_positions$constructorRef))
ctr_index <- tibble(
  constructorId = 1:length(ctrs),
  constructorRef = ctrs,
)

make_data_list <- function(ctr_positions) {
  tidy_ctr_posns <- ctr_positions |> merge(ctr_index)

  data_list <- list(
    n_ctrs = n_distinct(tidy_ctr_posns$constructorId),
    n_obs = nrow(tidy_ctr_posns),
    ctrs = tidy_ctr_posns$constructorId,
    positions = tidy_ctr_posns$position
  )
}

run_stan <- function(data_list, model_file = "f1.stan") {
  mod <- cmdstan_model(model_file, exe_file = str_c(model_file, ".bin"))
  mod$sample(data_list, seed = 42, show_messages = FALSE)
}

fit <- run_stan(make_data_list(ctr_positions))
fit
  variable    mean  median   sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__      1066.12 1066.51 2.55 2.39 1061.39 1069.56 1.00     1354     1292
 lambda[1]    7.99    7.97 0.78 0.83    6.74    9.34 1.00     2800     1333
 lambda[2]    6.34    6.31 0.60 0.60    5.42    7.40 1.00     3941     2479
 lambda[3]    2.53    2.52 0.36 0.35    1.97    3.14 1.00     4395     2641
 lambda[4]    6.86    6.84 0.67 0.67    5.82    8.00 1.00     3546     1968
 lambda[5]    1.70    1.68 0.30 0.30    1.24    2.20 1.00     3741     1634
 lambda[6]    4.12    4.11 0.43 0.44    3.42    4.86 1.00     4405     3065
 lambda[7]    7.48    7.43 0.77 0.78    6.28    8.79 1.00     1509      649
 lambda[8]    3.62    3.61 0.42 0.41    2.96    4.33 1.00     4388     3150
 lambda[9]    9.49    9.60 0.43 0.38    8.65    9.97 1.00     2975     1581

 # showing 10 of 11 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

Let’s tidy that up though, mapping parameters back to constructor names.

Code
sample_posterior <- function(fit) {
  df <- fit$draws("lambda", format = "draws_matrix") |> as_tibble()
  colnames(df) <- ctrs
  df |>
    pivot_longer(
      all_of(ctrs),
      names_to = "constructorRef", values_to = "position"
    ) |>
    mutate(position = as.double(position))
}

plot_posterior <- function(fit) {
  means <- sample_posterior(fit) |>
    group_by(constructorRef) |>
    summarise(mu = mean(position))
  ggplot(sample_posterior(fit)) +
    aes(x = position, fill = constructorRef) +
    scale_fill_manual(values = ctr_colours$colour) +
    geom_density(alpha = 0.5, colour = "black", linewidth = 0.1)
}

plot_posterior(fit)
Figure 3: Posterior samples for constructor performance

On this plot lower x-values are better.

The interesting thing is that this suggests the Red Bull was clearly the third-fastest car. I’m a little suspicious of that, because it claimed a sequence of 1-2s in the first third of the season. Let’s take a closer look at that.

The first thing to check is whether my dumb “max driver, 10 positions” model is problematic. Let’s look at Red Bull’s results.

Code
f2024 |>
  filter(constructorRef == "red_bull") |>
  select(gp, round, driverRef, position) |>
  pivot_wider(names_from = "driverRef", values_from = "position") |>
  arrange(round) |>
  merge(
    ctr_positions |> filter(constructorRef == "red_bull") |> select(round, position)
  ) |>
  kable()
Table 1: Red Bull’s results in F1 2024
round gp perez max_verstappen position
1 Bahrain Grand Prix 2 1 1
2 Saudi Arabian Grand Prix 2 1 1
3 Australian Grand Prix 5 NA 10
4 Japanese Grand Prix 2 1 1
5 Chinese Grand Prix 3 1 1
6 Miami Grand Prix 4 2 2
7 Emilia Romagna Grand Prix 8 1 1
8 Monaco Grand Prix NA 6 10
9 Canadian Grand Prix NA 1 9
10 Spanish Grand Prix 8 1 1
11 Austrian Grand Prix 7 5 4
12 British Grand Prix 17 2 1
13 Hungarian Grand Prix 7 5 4
14 Belgian Grand Prix 7 4 3
15 Dutch Grand Prix 6 2 2
16 Italian Grand Prix 8 6 4
17 Azerbaijan Grand Prix 17 5 4
18 Singapore Grand Prix 10 2 2
19 United States Grand Prix 7 3 2
20 Mexico City Grand Prix 17 6 4
21 São Paulo Grand Prix 11 1 1
22 Las Vegas Grand Prix 10 5 3
23 Qatar Grand Prix NA 1 9
24 Abu Dhabi Grand Prix NA 6 8

Round three is clearly a mistake: how can Perez’ 5th place become the 10th constructor position? Let’s work through it again.

Code
f2024 |>
  group_by(round, constructorRef) |>
  summarise(position = min(position)) |>
  mutate(reranked = rank(position)) |>
  filter(round == 3) |>
  kable()
Table 2: Round 3 results.
round constructorRef position reranked
3 alpine 13 7
3 aston_martin 6 3
3 ferrari 1 1
3 haas 9 5
3 mclaren 3 2
3 mercedes NA 9
3 rb 7 4
3 red_bull NA 10
3 sauber 14 8
3 williams 11 6

NA strikes again! Forgot to add the crucial na.rm parameter to the min. Right, a do-over.

Code
ctr_positions <- f2024 |>
  group_by(round, constructorRef) |>
  summarise(
    # in the event of a double DNF, assign the last position
    quali_position = pmin(min(quali_position, na.rm = TRUE), 10),
    position = pmin(min(position, na.rm = TRUE), 10)
  ) |>
  mutate(
    # re-rank
    quali_position = rank(quali_position),
    position = rank(position)
  ) |>
  arrange(round, position)

ggplot(ctr_positions) +
  aes(x = position, fill = constructorRef) +
  facet_wrap(~constructorRef) +
  scale_fill_manual(values = ctr_colours$colour) +
  geom_histogram(alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1, show.legend = FALSE)
Figure 4: Constructor position histograms after fixing NA sorting.

That looks more like I’d expect. Let’s try Stan again.

Code
fit <- run_stan(make_data_list(ctr_positions))
plot_posterior(fit)
Figure 5: Posterior samples after fixing NA sorting.

That is more intuitive: Red Bull is now much closer to McLaren and marginally ahead of Ferrari (despite coming third in the constructor’s championship). This tells us that yes, Verstappen probably would have won the driver’s championship in a McLaren - but contrary to claims, not in a Ferrari, since it wasn’t faster than the Red Bull overall.

The three teams that are on top of each other are: Alpine, Haas, and RB. You can also just about discern that the model is less sure about the top teams, since their posteriors are slightly wider.

There’s a lot we can improve about this model. The most obvious is that our priors for constructor performance are uniformative, and so our posteriors basically just reflect the data. Can we add some useful information to inform the model?

More informative priors

What makes some constructors better than others? Adrian Newey Money. Let’s reflect this in our priors.

\[ \lambda_{\text{ctr}} \sim \text{Normal}(\text{budget}_\text{ctr}, \sigma) \]

Budget data

This is a bit of a secret, so we have to estimate. We do know that the budget cap is 135M USD and it’s unlikely that anyone is operating below the cap. We also know that driver and top executive salaries are exempt from the cap. The top teams are spending about 100M USD on their drivers, and prior to the budget cap’s implementation likely invested in facilities that they can use for free (like wind tunnels).

With that in mind, I asked GPT-4o to search the web and estimate budgets for the teams. It came back with these results, which I sanity checked against this Blackbook Motorsport article.

Code
budgets <- tribble(
  ~constructorRef, ~budget,
  "red_bull", 400,
  "mercedes", 400,
  "ferrari", 400,
  "mclaren", 250,
  "aston_martin", 250,
  "alpine", 200,
  "rb", 150,
  "haas", 150,
  "williams", 150,
  "sauber", 150
) |>
  mutate(
    norm_budget = 1 - budget / max(budget)
  )
kable(select(budgets, -norm_budget))
Table 3: Constructor budget estimates (USD millions)
constructorRef budget
red_bull 400
mercedes 400
ferrari 400
mclaren 250
aston_martin 250
alpine 200
rb 150
haas 150
williams 150
sauber 150

Let’s update our model and run Stan again.

data {
  int<lower=1> n_ctrs;
  int<lower=1> n_obs;
  array[n_ctrs] int<lower=1, upper=10> prior_mus;
  real<lower=0> prior_sd;
  array[n_obs] int<lower=1, upper=10> ctrs;
  array[n_obs] int<lower=1, upper=10> positions;
}
parameters {
  array[n_ctrs] real<lower=1, upper=10> lambda;
}
model {
  lambda ~ normal(prior_mus, prior_sd);
  positions ~ poisson(lambda[ctrs]) T[1, 10];
}
Code
data_list_b <- make_data_list(ctr_positions)
data_list_b$prior_mus <- 1 + 10 * arrange(budgets, constructorRef)$norm_budget
data_list_b$prior_sd <- 1.5
fit.b <- run_stan(data_list_b, model_file = "f1.b.stan")

plot_posterior(fit.b)
Figure 6: Posterior samples from the model with budget-based priors.

This hasn’t made much difference. McLaren has eased back towards Red Bull because their budget is smaller. You could say they did a lot better than Red Bull relative to their budget.

Code
sp <- sample_posterior(fit.b)
sp |>
  group_by(constructorRef) |>
  summarise(mu = mean(position), sigma = sd(position)) |>
  arrange(mu) |>
  kable(digits = 2)
Table 4: Mean and SD from posterior sample.
constructorRef mu sigma
mclaren 2.06 0.33
red_bull 2.07 0.32
ferrari 2.40 0.32
mercedes 3.02 0.36
aston_martin 6.42 0.56
alpine 7.80 0.66
haas 8.00 0.67
rb 8.07 0.67
williams 8.82 0.63
sauber 9.12 0.56

This model suggests there’s very little nothing in it between Red Bull and McLaren, though the posterior distribution is wider for the top three teams than the others so there is more uncertainty.

Was budget a bad choice for the prior? Not necessarily: this model of constructor performance incorporates an important factor that isn’t clear from the results, but which should affect an objective assessment of which teams are better. Or to put it another way, if I were Max Verstappen I’d still rather be driving the Red Bull than the McLaren next year, because Red Bull’s budget suggests a more capable team overall. (Not to mention the increased wind tunnel time that third place in the constructors’ gets relative to the winner.)

Code
plot_posterior_with_prior <- function(ctrs, prior_mus, prior_sd, fit) {
  s_post <- sample_posterior(fit) |> mutate(sample = "posterior")

  ctr_index <- tibble(constructorRef = ctrs, constructorId = 1:length(ctrs))

  s_prior <- prior_mus |>
    lapply(\(mu) rnorm(100000, mu, prior_sd)) |>
    lapply(as_tibble) |>
    imap(\(t, i) mutate(t, constructorId = i, sample = "prior")) |>
    bind_rows() |>
    rename(position = value) |>
    merge(ctr_index)

  df <- bind_rows(s_prior, s_post)

  ggplot(df) +
    aes(x = position, fill = sample) +
    facet_wrap(~constructorRef) +
    geom_density(alpha = 0.5, linewidth = 0.1) +
    scale_x_continuous(breaks = seq(1, 10, by = 2), limits = c(1, 10)) +
    scale_fill_manual(values = c("dodgerblue1", "dodgerblue4"))
}

plot_posterior_with_prior(ctrs, data_list_b$prior_mus, data_list_b$prior_sd, fit.b)
Figure 7: Comparison of prior and posterior samples.

Plotting the priors and posteriors together shows how McLaren over-performed and Mercedes continued to underperform.

Past performance

Another, very easy way to set the priors is to consider the constructors’ performance over the previous seasons with the same regulations, in this case 2022 and 2023 (the ground-effect era). This is really easy to do - we would just take a weighted mean of the constructors’ finishing positions as mu for our priors, and perhaps set the variance proportional to the difference. Not particularly innovative or interesting though, so let’s skip it.

Car-limited circuits

Long-time F1 fans will know that at some tracks, the cars almost always finish in pairs, e.g. both Ferraris, then both Mercedes, and so on. Somehow at these circuits driver skill makes little difference; we could say the circuits are car-limited rather than driver-limited. That’s no fun to watch, but I wonder if it’s useful for establishing car performance?

Let’s look back at historic race results to see if some circuits are more likely to finish in two-by-two order. We can score this by the median difference in position between finishing drivers.

Code
circuits2024 <- races |>
  merge(circuits) |>
  filter(year == 2024) |>
  select(circuitRef)

position_gaps <- f1data |>
  filter(year > 2014) |> # hybrid era onwards, to try and keep it relevant to modern F1 car sizes
  merge(circuits2024) |> # circuits from 2024 only
  select(year, round, circuitRef, constructorRef, driverRef, position) |>
  filter(position != "\\N") |> # ignore DNFs
  mutate(position = as.integer(position)) |>
  group_by(year, round, circuitRef, constructorRef) |>
  filter(n() == 2) |> # must be a double finish
  summarise(gap = max(position) - min(position), max_position = max(position)) |>
  group_by(circuitRef) |>
  mutate(median_gap = median(gap))

ggplot(position_gaps) +
  aes(x = gap) +
  facet_wrap(vars(circuitRef)) +
  geom_histogram(fill = "dodgerblue1", alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1) +
  geom_vline(mapping = aes(xintercept = median_gap), colour = "dodgerblue4")
Figure 8: Distribution of finishing position gaps between teammates by circuit, with median line.

Perhaps unsurprisingly given that F1 isn’t a spec series, the gap between drivers of the same car is usually very low. I went back and forth on whether to discard cars that finished outside the top ten points-paying positions, as this often includes cars that were retired to save expense or were wounded but finished the race well below their potential for the sake of data gathering. In the end I kept them in because we need two-car finishes to calculate the statistics and setting a threshold would mean re-coding all the finish positions.

It turns out to make little difference anyway. The race with the highest probability of teammates finishing in consecutive positions is Suzuka, whether or not you apply a threshold.

Code
rounds2024 <- races |>
  filter(year == 2024) |>
  merge(circuits) |>
  select(circuitRef, round) |>
  rename(round2024 = round)

p_gap_1 <- position_gaps |>
  count(circuitRef, gap) |>
  group_by(circuitRef) |>
  mutate(p_gap = n / sum(n)) |>
  filter(gap == 1) |>
  ungroup() |>
  merge(rounds2024)

p_gap_1 |>
  slice_max(p_gap, n = 10) |>
  kable(digits = 2)
Table 5: Circuits with highest probability of having teammates finishing together.
circuitRef gap n p_gap round2024
suzuka 1 29 0.48 4
albert_park 1 19 0.41 3
rodriguez 1 26 0.39 20
jeddah 1 10 0.38 2
yas_marina 1 29 0.36 24
imola 1 10 0.33 7
hungaroring 1 26 0.32 13
shanghai 1 15 0.31 5
bahrain 1 25 0.30 1
interlagos 1 19 0.30 21

How did the cars line up at Suzuka in 2024, noting that this was round 4 of a 24-round championship?

Code
suzuka2024 <- f1data |>
  filter(year == 2024) |>
  filter(circuitRef == "suzuka") |>
  select(year, round, circuitRef, constructorRef, driverRef, quali_position, position)

suzuka2024 |>
  drop_na() |>
  group_by(constructorRef) |>
  filter(n() == 2) |>
  arrange(position) |>
  kable()
Table 6: Results for Suzuka 2024 (two-car finishes only)
year round circuitRef constructorRef driverRef quali_position position
2024 4 suzuka red_bull max_verstappen 1 1
2024 4 suzuka red_bull perez 2 2
2024 4 suzuka ferrari sainz 4 3
2024 4 suzuka ferrari leclerc 8 4
2024 4 suzuka mclaren norris 3 5
2024 4 suzuka aston_martin alonso 5 6
2024 4 suzuka mercedes russell 9 7
2024 4 suzuka mclaren piastri 6 8
2024 4 suzuka mercedes hamilton 7 9
2024 4 suzuka haas hulkenberg 12 11
2024 4 suzuka aston_martin stroll 16 12
2024 4 suzuka haas kevin_magnussen 18 13
2024 4 suzuka alpine ocon 15 15
2024 4 suzuka alpine gasly 17 16

The cars are quite well in order here, if you ignore the gulf between Alonso and Stroll. It looks like Suzuka is really a car-limited track. Incidentally, this would suggest that in round 4, McLaren and Mercedes were quite evenly matched.

Counter-arguments

Are we sure Suzuka is a car-limited track?

Suzuka is a track “drivers love”! Surely it’s a more skill-dependent track?

I would have expected so! Suzuka is described as a technical track, with minimal room for error. Perhaps there aren’t as many potential lines that a driver could take to distinguish themselves. It’s also a good track for overtaking, so any driver that does qualify out of position is able to recover. Or perhaps the reason drivers love it is because they get to drive their cars on the limit: the limit of the car’s performance.

Either way, statistically drivers are more likely to end up in pairs in Suzuka than anywhere else.

Suzuka was an early round. Perhaps car performance converges over the season?

On the one hand we’d expect to see cars start at different levels but converge throughout the season, as the teams spot innovations on other cars that they can copy. Or secret design documents that they can copy, ahem. On the other hand, constructors in a tight battle would be motivated to develop their cars further - for example in 2022 when Red Bull overhauled an early Ferrari advantage.

As it happens, historically Suzuka has been mid-to-late season, which suggests that being an early round in 2024 isn’t the reason for being car-limited.

Code
races |>
  filter(year >= 2014) |>
  group_by(year) |>
  mutate(n_rounds = n_distinct(round)) |>
  ungroup() |>
  merge(circuits) |>
  filter(circuitRef == "suzuka") |>
  select(year, round, n_rounds) |>
  arrange(year, round) |>
  kable()
Table 7: Suzuka’s placement in each season
year round n_rounds
2014 15 19
2015 14 19
2016 17 21
2017 16 20
2018 17 21
2019 17 21
2022 18 22
2023 16 22
2024 4 24

However the fact that it was an early round in 2024 does mean that car performance will have changed over the season. Suzuka can only ever be a point-in-time assessment.

Are we sure that consecutive finishes aren’t indicative of similar driver skill levels?

The top teams can also hire the best drivers, is it not possible that we’re still seeing the drivers finishing in skill order? I.e. it’s actually driver-limited, but the drivers happen to be mostly paired with similarly-fast drivers. Alonso and Stroll being the clear exception in the 2024 race results.

On the other hand, between 2023 and 2024 there were no line-up changes among the top 5 teams, so we can easily compare across the two years.

Code
suzuka2023 <- f1data |>
  filter(year == 2023) |>
  filter(circuitRef == "suzuka") |>
  select(year, round, circuitRef, constructorRef, driverRef, quali_position, position)

suzuka2023 |>
  drop_na() |>
  group_by(constructorRef) |>
  filter(n() == 2) |>
  arrange(position) |>
  kable()
Table 8: Results for Suzuka 2024 (two-car finishes only)
year round circuitRef constructorRef driverRef quali_position position
2023 16 suzuka mclaren norris 3 2
2023 16 suzuka mclaren piastri 2 3
2023 16 suzuka ferrari leclerc 4 4
2023 16 suzuka mercedes hamilton 7 5
2023 16 suzuka ferrari sainz 6 6
2023 16 suzuka mercedes russell 8 7
2023 16 suzuka alpine ocon 14 9
2023 16 suzuka alpine gasly 12 10
2023 16 suzuka alphatauri lawson 11 11
2023 16 suzuka alphatauri tsunoda 9 12
2023 16 suzuka haas hulkenberg 18 14
2023 16 suzuka haas kevin_magnussen 15 15

The order of constructors is very different, despite all drivers being the same except for Alpha Tauri/RB. Just for fun, here’s a side-by-side comparison for each driver for 2023 and 2024.

Code
s24 <- suzuka2024 |>
  select(constructorRef, driverRef, position) |>
  rename(position2024 = position)

s23 <- suzuka2023 |>
  # correct to the new names for joining
  # (yes I could have used constructorId, but opaque IDs are so tedious to handle)
  mutate(constructorRef = ifelse(constructorRef == "alphatauri", "rb", constructorRef)) |>
  mutate(constructorRef = ifelse(constructorRef == "alfa", "sauber", constructorRef)) |>
  select(constructorRef, driverRef, position) |>
  rename(position2023 = position)

merge(s23, s24, all = TRUE) |>
  arrange(position2023) |>
  kable()
Table 9: Drivers’ results for Suzuka in 2023 and 2024
constructorRef driverRef position2023 position2024
red_bull max_verstappen 1 1
mclaren norris 2 5
mclaren piastri 3 8
ferrari leclerc 4 4
mercedes hamilton 5 9
ferrari sainz 6 3
mercedes russell 7 7
aston_martin alonso 8 6
alpine ocon 9 15
alpine gasly 10 16
rb lawson 11 NA
rb tsunoda 12 10
sauber zhou 13 NA
haas hulkenberg 14 11
haas kevin_magnussen 15 13
aston_martin stroll NA 12
rb ricciardo NA NA
red_bull perez NA 2
sauber bottas NA 14
williams albon NA NA
williams sargeant NA 17
Don’t certain tracks favour certain cars (e.g. due to having fewer slow corners)?

This is a valid objection. We can only really say that this reflects car performance at tracks like Suzuka. Suzuka is a high-speed, high-downforce track with more high-speed than low-speed corners. You can kind of tell by looking that only corners 2, 9, 11, and 14 are slow.

Suzuka Circuit Layout

Data and image from the F1 Formbook.

So perhaps another way of looking at it is that low-speed corners are where drivers make the most difference. That aligns with the excellent recent analysis by Mark Hughes where he explains how drivers like Leclerc and Hamilton make up their time in slow corners.

Naturally certain cars will also be better at slow corners, and that’s not represented at Suzuka. We might be able to gain a better understanding with sector times, but sadly I don’t have this data.

Suzuka-based priors

At this point I’m convinced enough that Suzuka is a useful indicator of relative car performance to be a prior, so let’s take the position of each team’s fastest car round Suzuka as constructor performance priors. Luckily there weren’t any double-DNFs - at least one car from each team finished - so every team is represented.

The Stan model is the same, we’re just injecting different prior mus and SDs.

Code
suzuka_positions <- suzuka2024 |>
  drop_na() |>
  group_by(constructorRef) |>
  summarise(position = min(position)) |>
  mutate(position = rank(position)) |>
  arrange(constructorRef)

data_list_suzuka <- make_data_list(ctr_positions)
data_list_suzuka$prior_mus <- suzuka_positions$position
data_list_suzuka$prior_sd <- 1

fit_suzuka <- run_stan(data_list_suzuka, "f1.b.stan")

plot_posterior(fit_suzuka)
Figure 9: Posterior samples for the Suzuka-position priors.

It’s not dramatically different. It shouldn’t be, the model and data are the same and the priors are very similar to the budget-based priors - the fun was in figuring out that Suzuka is car-limited.

Code
plot_posterior_with_prior(ctrs, data_list_suzuka$prior_mus, data_list_suzuka$prior_sd, fit_suzuka)
Figure 10: Posterior samples alongside the Suzuka-position priors.

Most teams didn’t move far from their priors. I’d draw similar conclusions to before: the McLaren was fastest, Red Bull was a close second, and Ferrari were not far behind.

Code
sp <- sample_posterior(fit_suzuka)
sp |>
  group_by(constructorRef) |>
  summarise(mu = mean(position), sigma = sd(position)) |>
  arrange(mu) |>
  kable(digits = 2)
Table 10: Mean and SD from posterior sample, with Suzuka priors.
constructorRef mu sigma
red_bull 2.01 0.31
mclaren 2.05 0.32
ferrari 2.42 0.32
mercedes 3.39 0.39
aston_martin 6.07 0.49
rb 7.47 0.56
haas 7.77 0.58
alpine 8.55 0.64
sauber 9.08 0.52
williams 9.39 0.44

Reintroducing driver skill

Simple model achieved, let’s push on to something more complex i.e. the originally described model. Here it is:

data {
  int<lower=1> n_ctrs;
  int<lower=1> n_drvs;
  int<lower=1> n_obs;
  array[n_ctrs] real ctr_mus;
  real<lower=0> ctr_sd;
  array[n_obs] int<lower=1, upper=20> positions;
  array[n_obs, 2] int position_indices;
}
parameters {
  vector[n_ctrs] lambda_ctrs_raw;
  vector[n_drvs] lambda_drvs_raw;
}
transformed parameters {
  vector[n_ctrs] lambda_ctrs;
  vector[n_drvs] lambda_drvs;
  
  // Apply constraints to ensure identifiability
  lambda_ctrs = log(20) * inv_logit(lambda_ctrs_raw);
  lambda_drvs = log(20) * inv_logit(lambda_drvs_raw);
}
model {
  lambda_ctrs_raw ~ normal(ctr_mus, ctr_sd);
  lambda_drvs_raw ~ std_normal();
  
  for (k in 1 : n_obs) {
    int i = position_indices[k, 1];
    int j = position_indices[k, 2];
    positions[k] ~ poisson(exp(lambda_ctrs[i] + lambda_drvs[j]));
  }
}

There is a crucial change. To ensure the model is identifiable, the raw skill parameters are constrained to [0, 1] with an inverse logistic transform. The transformed parameters are then summed and exponented to get the lambda for the Poisson distribution.

Code
run_stan2 <- function(data_list, model_file = "f1.c.stan") {
  mod <- cmdstan_model(model_file, exe_file = str_c(model_file, ".bin"))
  mod$sample(data_list, seed = 42, show_messages = FALSE, parallel_chains = 8)
}

ctrs <- sort(unique(f2024$constructorRef))
drvs <- sort(unique(f2024$driverRef))

ctr_index <- tibble(constructorRef = ctrs, ctr_idx = 1:n_distinct(ctrs))
drv_index <- tibble(driverRef = drvs, drv_idx = 1:n_distinct(drvs))

ctr_drv_positions <- f2024 |>
  select(constructorRef, driverRef, position) |>
  arrange(constructorRef, driverRef) |>
  merge(ctr_index) |>
  merge(drv_index) |>
  drop_na()

sample_posterior2 <- function(fit) {
  ctrs <- sort(unique(ctr_drv_positions$constructorRef))
  drvs <- sort(unique(ctr_drv_positions$driverRef))
  df_ctrs <- fit$draws("lambda_ctrs", format = "draws_matrix") |> as_tibble()
  colnames(df_ctrs) <- ctrs

  df_drvs <- fit$draws("lambda_drvs", format = "draws_matrix") |> as_tibble()
  colnames(df_drvs) <- drvs

  list(
    ctrs = df_ctrs |>
      pivot_longer(
        all_of(ctrs),
        names_to = "constructorRef", values_to = "performance"
      ) |>
      mutate(performance = as.double(performance)),
    drvs = df_drvs |>
      pivot_longer(
        all_of(drvs),
        names_to = "driverRef", values_to = "performance"
      ) |>
      mutate(performance = as.double(performance))
  )
}

fit.c <- run_stan2(
  list(
    n_ctrs = n_distinct(ctrs),
    n_drvs = n_distinct(drvs),
    n_obs = nrow(ctr_drv_positions),
    ctr_mus = 0.2 * (-5 + suzuka_positions$position), # shift to centre on 0
    ctr_sd = 0.67, # picked a wide SD because car performance evolved over the season
    positions = ctr_drv_positions$position,
    position_indices = ctr_drv_positions |> select(ctr_idx, drv_idx) |> as.matrix()
  )
)

Let’s first have a look at the constructor performance as determined by the model.

Code
posterior_ctrs <- sample_posterior2(fit.c)$ctrs

ggplot(posterior_ctrs) +
  aes(x = performance, fill = constructorRef) +
  # facet_wrap(~constructorRef) +
  scale_fill_manual(values = ctr_colours$colour) +
  geom_density(alpha = 0.5, colour = "black", linewidth = 0.2, show.legend = TRUE)
Figure 11: Constructor performance parameters determined by model. Lower (left) is better.
Code
posterior_ctrs |>
  group_by(constructorRef) |>
  summarise(mean_performance = mean(performance), sd = sd(performance)) |>
  arrange(mean_performance) |>
  mutate(rank = rank(mean_performance)) |>
  kable(digits = 2)
Table 11: Constructor performance means and SDs from the posterior.
constructorRef mean_performance sd rank
red_bull 0.57 0.19 1
ferrari 0.70 0.21 2
mclaren 0.75 0.23 3
mercedes 0.95 0.26 4
aston_martin 1.18 0.36 5
haas 1.31 0.28 6
rb 1.34 0.34 7
alpine 1.55 0.31 8
williams 1.62 0.30 9
sauber 1.66 0.36 10

Note that the sampled parameter is now latent performance not positions, so the numbers don’t have a real-world meaning. What matters is relative performance, where a lower number is better.

The model suggests - controversially, but with appropriate uncertainty - that the Red Bull was the fastest car, followed by Ferrari and then the McLaren. However the relative positioning of the other teams seems about right.

Forgive me for not labelling all the drivers on this next plot, but there’s only so much plot fiddling I’m willing to do and it’s usually obvious which is which.

Code
posterior_drvs <- sample_posterior2(fit.c)$drvs |> #|
  merge(ctr_drvs) |>
  group_by(driverRef) |>
  mutate(x = mean(performance))

ggplot(posterior_drvs) +
  aes(x = performance, fill = driverRef, label = driverRef) +
  facet_wrap(~constructorRef) +
  scale_fill_manual(values = drv_colours$colour) +
  geom_density(alpha = 0.5, colour = "black", linewidth = 0.2, show.legend = FALSE) +
  geom_text(data = posterior_drvs, mapping = aes(x = x, y = 0.75, angle = 90), check_overlap = TRUE)
Figure 12: Driver performance parameters determined by model. Lower (left) is better.
Code
posterior_drvs |>
  group_by(driverRef) |>
  summarise(mean_performance = mean(performance), sd = sd(performance)) |>
  arrange(mean_performance) |>
  mutate(rank = rank(mean_performance)) |>
  kable(digits = 2)
Table 12: Driver performance means and SDs from the posterior.
driverRef mean_performance sd rank
max_verstappen 0.55 0.20 1
leclerc 0.68 0.22 2
norris 0.71 0.24 3
russell 0.78 0.26 4
colapinto 0.81 0.31 5
hamilton 0.83 0.26 6
sainz 0.86 0.22 7
piastri 0.88 0.24 8
gasly 0.89 0.32 9
albon 0.93 0.31 10
ocon 1.03 0.31 11
hulkenberg 1.06 0.29 12
zhou 1.06 0.36 13
bottas 1.08 0.36 14
alonso 1.11 0.36 15
bearman 1.12 0.29 16
tsunoda 1.13 0.34 17
doohan 1.16 0.37 18
sargeant 1.19 0.31 19
ricciardo 1.20 0.34 20
kevin_magnussen 1.25 0.29 21
lawson 1.26 0.35 22
stroll 1.36 0.36 23
perez 1.49 0.21 24

The model suggests that the top drivers performed similarly, with Verstappen fastest. Poor Perez comes dead last, because the model has decided that the Red Bull was the fastest car.

There are some obvious issues though. Alonso is considered only the 15th best performer out of 24, which is absurd. This could be because the Suzuka prior placed Aston Martin ahead of Mercedes whereas they actually finished the season well behind.

If the informative prior is so bad, why don’t we just use uninformative priors? If we do that the model will not be willing/able to distinguish the contribution from cars and drivers. We need a better prior.

There will also be some effect from the way I’ve simply excluded DNFs rather than penalising driver-fault DNFs.

Weaknesses of this model

To an extent I’m just role-playing being Dr Phillips here, so your number one takeaway should be: prefer the F1 Metrics model or the Bell et al. (2016) model, which is a much more sophisticated Bayesian model. If you are seriously interested in estimating driver performance, please check those out.

Nonetheless, there are some obvious weaknesses in this model that it’s worth highlighting.

Use of positions rather than points

Phillips makes a good argument for a non-linear reward function, for it punishes drivers more fairly for DNFs. That would be an improvement here.

No assignment of DNF blame

I’ve just discarded all that data, which flatters the DNF-prone drivers. Though I note that the drivers near the bottom of the performance rankings in my model were more DNF-prone in 2024 anyway, so perhaps there’s a correlation between low performance and DNFs, or those drivers had some races where they wounded the car but finished anyway.

Only a single season considered

With more data for each driver - across different cars - the model would be better able to distinguish driver and car performance. Note that it would be necessary to model each constructor by year because the car performance varies so much.

This suggests that we might benefit from having multiple representations of car performance within each year, e.g. splitting each team into A and B specs according to when they brought their largest upgrade. The challenge with that (apart from the tedious business of determining the break points) is that we’d have less data for each car and also there are perhaps several step changes in performance throughout the season.

Summing it all up

What did we learn?

  1. How to use Stan to sample the posterior for a Bayesian model in R.
  2. Very little about F1.

Perhaps point 2 is under-selling a little, since we did learn that Suzuka is quite good for assessing relative constructor performance, and the model turned out some reasonable results for car and driver performance. There are some major caveats of couse, but it is the kind of model we could build on. An obvious future step is to look at the other car-limited tracks - hopefully some that are at a different point in the season and have more slow corners than Suzuka - and make the prior a combination of results at these tracks. If I try that out in future I’ll be sure to post an update here.