For a bit of fun, let’s try and model the performance of F1 drivers and constructors in the 2024 season using Bayesian models. The aim is to more to learn about using Stan than determine deep truths about F1, since there’s far more to performance than we can easily model.

We’ll vaguely imitate the f1-metrics model, which I used to eagerly await after each season. In this model there are latent variables of driver skill and car quality, which predict points/positions. However I never did get hold of the paper for that, and am instead following the amazing Rasmus Bååth’s blog where he models the latent skill of football teams in La Liga.

I gathered CSVs of race data from the fantastic Ergast. The weapon of choice is R and the tidyverse, since nothing else is quite so ergonomic for this type of analysis. You can find the code behind the folds if you want to play along.

Code

library(tidyverse)
library(knitr) # this lib is for rendering tables nicely

qualifying <- read_csv("data/qualifying.csv") |> rename(quali_position = position)
races <- read_csv("data/races.csv") |>
  select(raceId, year, round, circuitId, name) |>
  rename(gp = name)
results <- read_csv("data/results.csv") |> select(resultId, raceId, driverId, constructorId, grid, position, points, milliseconds)
constructors <- read_csv("data/constructors.csv") |> select(constructorId, constructorRef)
drivers <- read_csv("data/drivers.csv") |> select(driverId, driverRef)
circuits <- read_csv("data/circuits.csv") |> select(circuitId, circuitRef)

ctr_drvs <- tribble(
  ~constructorRef, ~driverRef,
  "alpine", "doohan",
  "alpine", "gasly",
  "alpine", "ocon",
  "aston_martin", "alonso",
  "aston_martin", "stroll",
  "ferrari", "bearman",
  "ferrari", "leclerc",
  "ferrari", "sainz",
  "haas", "bearman",
  "haas", "hulkenberg",
  "haas", "kevin_magnussen",
  "mclaren", "norris",
  "mclaren", "piastri",
  "mercedes", "hamilton",
  "mercedes", "russell",
  "rb", "lawson",
  "rb", "ricciardo",
  "rb", "tsunoda",
  "red_bull", "max_verstappen",
  "red_bull", "perez",
  "sauber", "bottas",
  "sauber", "zhou",
  "williams", "albon",
  "williams", "colapinto",
  "williams", "sargeant",
)

ctr_colours <- tribble(
  ~constructorRef, ~colour,
  "red_bull", "#3671c6",
  "mercedes", "#26f5d2",
  "ferrari", "#e80220",
  "mclaren", "#ff8001",
  "aston_martin", "#229971",
  "alpine", "#F363B9", # Should be "#0093cc" but there's too much blue
  "williams", "#63c4ff",
  "rb", "#6592ff",
  "sauber", "#52e252",
  "haas", "#b6babd"
) |>
  arrange(constructorRef)

drv_colours <- ctr_colours |>
  merge(ctr_drvs) |>
  group_by(driverRef) |>
  arrange(constructorRef) |>
  slice_head(n = 1) |> # alphabetical first for multi-team drivers i.e. Bearman
  select(driverRef, colour) |>
  arrange(driverRef)

f1data <- races |>
  merge(results) |>
  merge(qualifying) |>
  merge(constructors) |>
  merge(drivers) |>
  merge(circuits) |>
  select(raceId, gp, year, round, circuitRef, constructorRef, driverRef, q1, q2, q3, quali_position, grid, position) |>
  mutate(position = ifelse(position == "\\N", NA, as.numeric(position)))

In the interests of being educational - and because Lord knows I hate editing - I am writing this post as I go, including any blind-alleys and debugging.

Let’s start with some visualisations of the data we have.

Code

f2024 <- f1data |> filter(year == 2024)

ggplot(f2024) +
  aes(x = position, fill = driverRef) +
  facet_wrap(~driverRef) +
  scale_fill_manual(values = drv_colours$colour) +
  geom_histogram(alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1, show.legend = FALSE)

Figure 1: Histograms of driver performance in F1 2024.

The model goes like this. Points are simulated as a draw from a Poisson distribution parameterised by the driver performance and constructor performance.

\[ \text{points} \sim \text{Poisson}(e^{\text{perf}_{\text{drv}} + \text{perf}_{\text{ctr}}}) \]

Driver performance is simulated as a draw from a normal distribution.

\[ \text{perf}_{\text{drv}} \sim \text{Normal}(\mu_{\text{drv}}, \sigma_{\text{drv}}^2) \]

Likewise constructor performance is simulated as a draw from a normal distribution.

\[ \text{perf}_{\text{ctr}} \sim \text{Normal}(\mu_{\text{ctr}}, \sigma_{\text{ctr}}^2) \]

We will therefore have a likelihood function that is the product of the Poisson density of the points for each driver for each race. The number of parameters is very high, as a performance score for each driver and for each constructor.

Constructor performance

Let’s start simpler, with a view of just the constructor performance. Assume that at least one driver for every constructor maximised the car’s performance each weekend. THIS ISN’T TRUE. If Albon had a poor weekend, Sargeant wasn’t going to step up, as one of many examples on the grid in 2024. However it is a useful approximation that let’s us analyse just the constructor performance.

Code

ctr_positions <- f2024 |>
  group_by(round, constructorRef) |>
  summarise(quali_position = min(quali_position), position = min(position)) |>
  mutate(
    # re-rank
    quali_position = rank(quali_position),
    position = rank(position)
  ) |>
  arrange(round, position)

ggplot(ctr_positions) +
  aes(x = position, fill = constructorRef) +
  facet_wrap(~constructorRef) +
  scale_fill_manual(values = ctr_colours$colour) +
  geom_histogram(alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1, show.legend = FALSE)

Figure 2: Histogram of re-ranked constructor results in F1 2024.

A key point here is that we’ve recalculated positions using only the max driver’s position for each constructor, i.e. there are now only 10 finishing positions.

Again we can simulate the position \(p\) as a draw from a Poisson distribution, this time where \(\lambda_\text{ctr}\) is the performance of the constructor only.

\[ p \sim \text{Poisson}(\lambda_{\text{ctr}}) \]

To keep things simple, let’s start with an uninformative prior, a uniform distribution for the performance of the constructors.

\[ \lambda_{ctr} \sim \text{Uniform}(1, 10) \]

Then our likelihood function is much simpler. (We won’t actually need this, but I find it useful to write out for the sake of understanding.)

Code

log_lik_ctr <- function(ctr_positions, ctr_perfs) {
  ctrs <- sort(unique(ctr_positions$constructorRef))
  log_lik <- 1.0
  for (i in 1:length(ctrs)) {
    race_posns <- filter(ctr_positions, constructorRef == ctrs[i])$position
    log_lik <- log_lik + sum(dpois(race_posns, lambda = ctr_perfs[i], log = TRUE))
  }
  log_lik
}

Let’s throw Stan at this now. Here’s our Stan model file.

data {
  int<lower=1> n_ctrs;
  int<lower=1> n_obs;
  array[n_obs] int<lower=1, upper=10> ctrs;
  array[n_obs] int<lower=1, upper=10> positions;
}
parameters {
  array[n_ctrs] real<lower=1, upper=10> lambda;
}
model {
  lambda ~ uniform(1, 10);
  positions ~ poisson(lambda[ctrs]) T[1, 10];

  // the above "distribution" syntax is equivalent to:
  //target += uniform_lpdf(lambda | 1, 10);
  //target += poisson_lpmf(positions | lambda[ctrs]);
}

Below the fold is the R code for interacting with it via CmdStanR. I settled on CmdStanR rather than RStan after hitting too many nameless runtime errors through RStan - see here.

Code

library(cmdstanr, quietly = TRUE)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)

ctrs <- sort(unique(ctr_positions$constructorRef))
ctr_index <- tibble(
  constructorId = 1:length(ctrs),
  constructorRef = ctrs,
)

make_data_list <- function(ctr_positions) {
  tidy_ctr_posns <- ctr_positions |> merge(ctr_index)

  data_list <- list(
    n_ctrs = n_distinct(tidy_ctr_posns$constructorId),
    n_obs = nrow(tidy_ctr_posns),
    ctrs = tidy_ctr_posns$constructorId,
    positions = tidy_ctr_posns$position
  )
}

run_stan <- function(data_list, model_file = "f1.stan") {
  mod <- cmdstan_model(model_file, exe_file = str_c(model_file, ".bin"))
  mod$sample(data_list, seed = 42, show_messages = FALSE)
}

fit <- run_stan(make_data_list(ctr_positions))
fit

  variable    mean  median   sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__      1066.12 1066.51 2.55 2.39 1061.39 1069.56 1.00     1354     1292
 lambda[1]    7.99    7.97 0.78 0.83    6.74    9.34 1.00     2800     1333
 lambda[2]    6.34    6.31 0.60 0.60    5.42    7.40 1.00     3941     2479
 lambda[3]    2.53    2.52 0.36 0.35    1.97    3.14 1.00     4395     2641
 lambda[4]    6.86    6.84 0.67 0.67    5.82    8.00 1.00     3546     1968
 lambda[5]    1.70    1.68 0.30 0.30    1.24    2.20 1.00     3741     1634
 lambda[6]    4.12    4.11 0.43 0.44    3.42    4.86 1.00     4405     3065
 lambda[7]    7.48    7.43 0.77 0.78    6.28    8.79 1.00     1509      649
 lambda[8]    3.62    3.61 0.42 0.41    2.96    4.33 1.00     4388     3150
 lambda[9]    9.49    9.60 0.43 0.38    8.65    9.97 1.00     2975     1581

 # showing 10 of 11 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

Let’s tidy that up though, mapping parameters back to constructor names.

Code

sample_posterior <- function(fit) {
  df <- fit$draws("lambda", format = "draws_matrix") |> as_tibble()
  colnames(df) <- ctrs
  df |>
    pivot_longer(
      all_of(ctrs),
      names_to = "constructorRef", values_to = "position"
    ) |>
    mutate(position = as.double(position))
}

plot_posterior <- function(fit) {
  means <- sample_posterior(fit) |>
    group_by(constructorRef) |>
    summarise(mu = mean(position))
  ggplot(sample_posterior(fit)) +
    aes(x = position, fill = constructorRef) +
    scale_fill_manual(values = ctr_colours$colour) +
    geom_density(alpha = 0.5, colour = "black", linewidth = 0.1)
}

plot_posterior(fit)

Figure 3: Posterior samples for constructor performance

On this plot lower x-values are better.

The interesting thing is that this suggests the Red Bull was clearly the third-fastest car. I’m a little suspicious of that, because it claimed a sequence of 1-2s in the first third of the season. Let’s take a closer look at that.

The first thing to check is whether my dumb “max driver, 10 positions” model is problematic. Let’s look at Red Bull’s results.

Code

f2024 |>
  filter(constructorRef == "red_bull") |>
  select(gp, round, driverRef, position) |>
  pivot_wider(names_from = "driverRef", values_from = "position") |>
  arrange(round) |>
  merge(
    ctr_positions |> filter(constructorRef == "red_bull") |> select(round, position)
  ) |>
  kable()

Table 1: Red Bull’s results in F1 2024

round	gp	perez	max_verstappen	position
1	Bahrain Grand Prix	2	1	1
2	Saudi Arabian Grand Prix	2	1	1
3	Australian Grand Prix	5	NA	10
4	Japanese Grand Prix	2	1	1
5	Chinese Grand Prix	3	1	1
6	Miami Grand Prix	4	2	2
7	Emilia Romagna Grand Prix	8	1	1
8	Monaco Grand Prix	NA	6	10
9	Canadian Grand Prix	NA	1	9
10	Spanish Grand Prix	8	1	1
11	Austrian Grand Prix	7	5	4
12	British Grand Prix	17	2	1
13	Hungarian Grand Prix	7	5	4
14	Belgian Grand Prix	7	4	3
15	Dutch Grand Prix	6	2	2
16	Italian Grand Prix	8	6	4
17	Azerbaijan Grand Prix	17	5	4
18	Singapore Grand Prix	10	2	2
19	United States Grand Prix	7	3	2
20	Mexico City Grand Prix	17	6	4
21	São Paulo Grand Prix	11	1	1
22	Las Vegas Grand Prix	10	5	3
23	Qatar Grand Prix	NA	1	9
24	Abu Dhabi Grand Prix	NA	6	8

Round three is clearly a mistake: how can Perez’ 5th place become the 10th constructor position? Let’s work through it again.

Code

f2024 |>
  group_by(round, constructorRef) |>
  summarise(position = min(position)) |>
  mutate(reranked = rank(position)) |>
  filter(round == 3) |>
  kable()

Table 2: Round 3 results.

round	constructorRef	position	reranked
3	alpine	13	7
3	aston_martin	6	3
3	ferrari	1	1
3	haas	9	5
3	mclaren	3	2
3	mercedes	NA	9
3	rb	7	4
3	red_bull	NA	10
3	sauber	14	8
3	williams	11	6

NA strikes again! Forgot to add the crucial na.rm parameter to the min. Right, a do-over.

Code

ctr_positions <- f2024 |>
  group_by(round, constructorRef) |>
  summarise(
    # in the event of a double DNF, assign the last position
    quali_position = pmin(min(quali_position, na.rm = TRUE), 10),
    position = pmin(min(position, na.rm = TRUE), 10)
  ) |>
  mutate(
    # re-rank
    quali_position = rank(quali_position),
    position = rank(position)
  ) |>
  arrange(round, position)

ggplot(ctr_positions) +
  aes(x = position, fill = constructorRef) +
  facet_wrap(~constructorRef) +
  scale_fill_manual(values = ctr_colours$colour) +
  geom_histogram(alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1, show.legend = FALSE)

Figure 4: Constructor position histograms after fixing NA sorting.

That looks more like I’d expect. Let’s try Stan again.

Code

fit <- run_stan(make_data_list(ctr_positions))
plot_posterior(fit)

Figure 5: Posterior samples after fixing NA sorting.

That is more intuitive: Red Bull is now much closer to McLaren and marginally ahead of Ferrari (despite coming third in the constructor’s championship). This tells us that yes, Verstappen probably would have won the driver’s championship in a McLaren - but contrary to claims, not in a Ferrari, since it wasn’t faster than the Red Bull overall.

The three teams that are on top of each other are: Alpine, Haas, and RB. You can also just about discern that the model is less sure about the top teams, since their posteriors are slightly wider.

There’s a lot we can improve about this model. The most obvious is that our priors for constructor performance are uniformative, and so our posteriors basically just reflect the data. Can we add some useful information to inform the model?

More informative priors

What makes some constructors better than others? ~~Adrian Newey~~ Money. Let’s reflect this in our priors.

\[ \lambda_{\text{ctr}} \sim \text{Normal}(\text{budget}_\text{ctr}, \sigma) \]

Budget data

This is a bit of a secret, so we have to estimate. We do know that the budget cap is 135M USD and it’s unlikely that anyone is operating below the cap. We also know that driver and top executive salaries are exempt from the cap. The top teams are spending about 100M USD on their drivers, and prior to the budget cap’s implementation likely invested in facilities that they can use for free (like wind tunnels).

With that in mind, I asked GPT-4o to search the web and estimate budgets for the teams. It came back with these results, which I sanity checked against this Blackbook Motorsport article.

Code

budgets <- tribble(
  ~constructorRef, ~budget,
  "red_bull", 400,
  "mercedes", 400,
  "ferrari", 400,
  "mclaren", 250,
  "aston_martin", 250,
  "alpine", 200,
  "rb", 150,
  "haas", 150,
  "williams", 150,
  "sauber", 150
) |>
  mutate(
    norm_budget = 1 - budget / max(budget)
  )
kable(select(budgets, -norm_budget))

Table 3: Constructor budget estimates (USD millions)

constructorRef	budget
red_bull	400
mercedes	400
ferrari	400
mclaren	250
aston_martin	250
alpine	200
rb	150
haas	150
williams	150
sauber	150

Let’s update our model and run Stan again.

data {
  int<lower=1> n_ctrs;
  int<lower=1> n_obs;
  array[n_ctrs] int<lower=1, upper=10> prior_mus;
  real<lower=0> prior_sd;
  array[n_obs] int<lower=1, upper=10> ctrs;
  array[n_obs] int<lower=1, upper=10> positions;
}
parameters {
  array[n_ctrs] real<lower=1, upper=10> lambda;
}
model {
  lambda ~ normal(prior_mus, prior_sd);
  positions ~ poisson(lambda[ctrs]) T[1, 10];
}

Code

data_list_b <- make_data_list(ctr_positions)
data_list_b$prior_mus <- 1 + 10 * arrange(budgets, constructorRef)$norm_budget
data_list_b$prior_sd <- 1.5
fit.b <- run_stan(data_list_b, model_file = "f1.b.stan")

plot_posterior(fit.b)

Figure 6: Posterior samples from the model with budget-based priors.

This hasn’t made much difference. McLaren has eased back towards Red Bull because their budget is smaller. You could say they did a lot better than Red Bull relative to their budget.

Code

sp <- sample_posterior(fit.b)
sp |>
  group_by(constructorRef) |>
  summarise(mu = mean(position), sigma = sd(position)) |>
  arrange(mu) |>
  kable(digits = 2)

Table 4: Mean and SD from posterior sample.

constructorRef	mu	sigma
mclaren	2.06	0.33
red_bull	2.07	0.32
ferrari	2.40	0.32
mercedes	3.02	0.36
aston_martin	6.42	0.56
alpine	7.80	0.66
haas	8.00	0.67
rb	8.07	0.67
williams	8.82	0.63
sauber	9.12	0.56

This model suggests there’s very little nothing in it between Red Bull and McLaren, though the posterior distribution is wider for the top three teams than the others so there is more uncertainty.

Was budget a bad choice for the prior? Not necessarily: this model of constructor performance incorporates an important factor that isn’t clear from the results, but which should affect an objective assessment of which teams are better. Or to put it another way, if I were Max Verstappen I’d still rather be driving the Red Bull than the McLaren next year, because Red Bull’s budget suggests a more capable team overall. (Not to mention the increased wind tunnel time that third place in the constructors’ gets relative to the winner.)

Code

plot_posterior_with_prior <- function(ctrs, prior_mus, prior_sd, fit) {
  s_post <- sample_posterior(fit) |> mutate(sample = "posterior")

  ctr_index <- tibble(constructorRef = ctrs, constructorId = 1:length(ctrs))

  s_prior <- prior_mus |>
    lapply(\(mu) rnorm(100000, mu, prior_sd)) |>
    lapply(as_tibble) |>
    imap(\(t, i) mutate(t, constructorId = i, sample = "prior")) |>
    bind_rows() |>
    rename(position = value) |>
    merge(ctr_index)

  df <- bind_rows(s_prior, s_post)

  ggplot(df) +
    aes(x = position, fill = sample) +
    facet_wrap(~constructorRef) +
    geom_density(alpha = 0.5, linewidth = 0.1) +
    scale_x_continuous(breaks = seq(1, 10, by = 2), limits = c(1, 10)) +
    scale_fill_manual(values = c("dodgerblue1", "dodgerblue4"))
}

plot_posterior_with_prior(ctrs, data_list_b$prior_mus, data_list_b$prior_sd, fit.b)

Figure 7: Comparison of prior and posterior samples.

Plotting the priors and posteriors together shows how McLaren over-performed and Mercedes continued to underperform.

Past performance

Another, very easy way to set the priors is to consider the constructors’ performance over the previous seasons with the same regulations, in this case 2022 and 2023 (the ground-effect era). This is really easy to do - we would just take a weighted mean of the constructors’ finishing positions as mu for our priors, and perhaps set the variance proportional to the difference. Not particularly innovative or interesting though, so let’s skip it.

Car-limited circuits

Long-time F1 fans will know that at some tracks, the cars almost always finish in pairs, e.g. both Ferraris, then both Mercedes, and so on. Somehow at these circuits driver skill makes little difference; we could say the circuits are car-limited rather than driver-limited. That’s no fun to watch, but I wonder if it’s useful for establishing car performance?

Let’s look back at historic race results to see if some circuits are more likely to finish in two-by-two order. We can score this by the median difference in position between finishing drivers.

Code

circuits2024 <- races |>
  merge(circuits) |>
  filter(year == 2024) |>
  select(circuitRef)

position_gaps <- f1data |>
  filter(year > 2014) |> # hybrid era onwards, to try and keep it relevant to modern F1 car sizes
  merge(circuits2024) |> # circuits from 2024 only
  select(year, round, circuitRef, constructorRef, driverRef, position) |>
  filter(position != "\\N") |> # ignore DNFs
  mutate(position = as.integer(position)) |>
  group_by(year, round, circuitRef, constructorRef) |>
  filter(n() == 2) |> # must be a double finish
  summarise(gap = max(position) - min(position), max_position = max(position)) |>
  group_by(circuitRef) |>
  mutate(median_gap = median(gap))

ggplot(position_gaps) +
  aes(x = gap) +
  facet_wrap(vars(circuitRef)) +
  geom_histogram(fill = "dodgerblue1", alpha = 0.5, colour = "black", linewidth = 0.1, binwidth = 1) +
  geom_vline(mapping = aes(xintercept = median_gap), colour = "dodgerblue4")

Figure 8: Distribution of finishing position gaps between teammates by circuit, with median line.

Perhaps unsurprisingly given that F1 isn’t a spec series, the gap between drivers of the same car is usually very low. I went back and forth on whether to discard cars that finished outside the top ten points-paying positions, as this often includes cars that were retired to save expense or were wounded but finished the race well below their potential for the sake of data gathering. In the end I kept them in because we need two-car finishes to calculate the statistics and setting a threshold would mean re-coding all the finish positions.

It turns out to make little difference anyway. The race with the highest probability of teammates finishing in consecutive positions is Suzuka, whether or not you apply a threshold.

Code

rounds2024 <- races |>
  filter(year == 2024) |>
  merge(circuits) |>
  select(circuitRef, round) |>
  rename(round2024 = round)

p_gap_1 <- position_gaps |>
  count(circuitRef, gap) |>
  group_by(circuitRef) |>
  mutate(p_gap = n / sum(n)) |>
  filter(gap == 1) |>
  ungroup() |>
  merge(rounds2024)

p_gap_1 |>
  slice_max(p_gap, n = 10) |>
  kable(digits = 2)

Table 5: Circuits with highest probability of having teammates finishing together.

circuitRef	gap	n	p_gap	round2024
suzuka	1	29	0.48	4
albert_park	1	19	0.41	3
rodriguez	1	26	0.39	20
jeddah	1	10	0.38	2
yas_marina	1	29	0.36	24
imola	1	10	0.33	7
hungaroring	1	26	0.32	13
shanghai	1	15	0.31	5
bahrain	1	25	0.30	1
interlagos	1	19	0.30	21

How did the cars line up at Suzuka in 2024, noting that this was round 4 of a 24-round championship?

Code

suzuka2024 <- f1data |>
  filter(year == 2024) |>
  filter(circuitRef == "suzuka") |>
  select(year, round, circuitRef, constructorRef, driverRef, quali_position, position)

suzuka2024 |>
  drop_na() |>
  group_by(constructorRef) |>
  filter(n() == 2) |>
  arrange(position) |>
  kable()

Table 6: Results for Suzuka 2024 (two-car finishes only)

year	round	circuitRef	constructorRef	driverRef	quali_position	position
2024	4	suzuka	red_bull	max_verstappen	1	1
2024	4	suzuka	red_bull	perez	2	2
2024	4	suzuka	ferrari	sainz	4	3
2024	4	suzuka	ferrari	leclerc	8	4
2024	4	suzuka	mclaren	norris	3	5
2024	4	suzuka	aston_martin	alonso	5	6
2024	4	suzuka	mercedes	russell	9	7
2024	4	suzuka	mclaren	piastri	6	8
2024	4	suzuka	mercedes	hamilton	7	9
2024	4	suzuka	haas	hulkenberg	12	11
2024	4	suzuka	aston_martin	stroll	16	12
2024	4	suzuka	haas	kevin_magnussen	18	13
2024	4	suzuka	alpine	ocon	15	15
2024	4	suzuka	alpine	gasly	17	16

The cars are quite well in order here, if you ignore the gulf between Alonso and Stroll. It looks like Suzuka is really a car-limited track. Incidentally, this would suggest that in round 4, McLaren and Mercedes were quite evenly matched.

Counter-arguments

Are we sure Suzuka is a car-limited track?

Suzuka is a track “drivers love”! Surely it’s a more skill-dependent track?

I would have expected so! Suzuka is described as a technical track, with minimal room for error. Perhaps there aren’t as many potential lines that a driver could take to distinguish themselves. It’s also a good track for overtaking, so any driver that does qualify out of position is able to recover. Or perhaps the reason drivers love it is because they get to drive their cars on the limit: the limit of the car’s performance.

Either way, statistically drivers are more likely to end up in pairs in Suzuka than anywhere else.

Suzuka was an early round. Perhaps car performance converges over the season?

On the one hand we’d expect to see cars start at different levels but converge throughout the season, as the teams spot innovations on other cars that they can copy. Or secret design documents that they can copy, ahem. On the other hand, constructors in a tight battle would be motivated to develop their cars further - for example in 2022 when Red Bull overhauled an early Ferrari advantage.

As it happens, historically Suzuka has been mid-to-late season, which suggests that being an early round in 2024 isn’t the reason for being car-limited.

Code

races |>
  filter(year >= 2014) |>
  group_by(year) |>
  mutate(n_rounds = n_distinct(round)) |>
  ungroup() |>
  merge(circuits) |>
  filter(circuitRef == "suzuka") |>
  select(year, round, n_rounds) |>
  arrange(year, round) |>
  kable()

Table 7: Suzuka’s placement in each season

year	round	n_rounds
2014	15	19
2015	14	19
2016	17	21
2017	16	20
2018	17	21
2019	17	21
2022	18	22
2023	16	22
2024	4	24

However the fact that it was an early round in 2024 does mean that car performance will have changed over the season. Suzuka can only ever be a point-in-time assessment.

Are we sure that consecutive finishes aren’t indicative of similar driver skill levels?

The top teams can also hire the best drivers, is it not possible that we’re still seeing the drivers finishing in skill order? I.e. it’s actually driver-limited, but the drivers happen to be mostly paired with similarly-fast drivers. Alonso and Stroll being the clear exception in the 2024 race results.

On the other hand, between 2023 and 2024 there were no line-up changes among the top 5 teams, so we can easily compare across the two years.

Code

suzuka2023 <- f1data |>
  filter(year == 2023) |>
  filter(circuitRef == "suzuka") |>
  select(year, round, circuitRef, constructorRef, driverRef, quali_position, position)

suzuka2023 |>
  drop_na() |>
  group_by(constructorRef) |>
  filter(n() == 2) |>
  arrange(position) |>
  kable()

Table 8: Results for Suzuka 2024 (two-car finishes only)

year	round	circuitRef	constructorRef	driverRef	quali_position	position
2023	16	suzuka	mclaren	norris	3	2
2023	16	suzuka	mclaren	piastri	2	3
2023	16	suzuka	ferrari	leclerc	4	4
2023	16	suzuka	mercedes	hamilton	7	5
2023	16	suzuka	ferrari	sainz	6	6
2023	16	suzuka	mercedes	russell	8	7
2023	16	suzuka	alpine	ocon	14	9
2023	16	suzuka	alpine	gasly	12	10
2023	16	suzuka	alphatauri	lawson	11	11
2023	16	suzuka	alphatauri	tsunoda	9	12
2023	16	suzuka	haas	hulkenberg	18	14
2023	16	suzuka	haas	kevin_magnussen	15	15

The order of constructors is very different, despite all drivers being the same except for Alpha Tauri/RB. Just for fun, here’s a side-by-side comparison for each driver for 2023 and 2024.

Code

s24 <- suzuka2024 |>
  select(constructorRef, driverRef, position) |>
  rename(position2024 = position)

s23 <- suzuka2023 |>
  # correct to the new names for joining
  # (yes I could have used constructorId, but opaque IDs are so tedious to handle)
  mutate(constructorRef = ifelse(constructorRef == "alphatauri", "rb", constructorRef)) |>
  mutate(constructorRef = ifelse(constructorRef == "alfa", "sauber", constructorRef)) |>
  select(constructorRef, driverRef, position) |>
  rename(position2023 = position)

merge(s23, s24, all = TRUE) |>
  arrange(position2023) |>
  kable()

Table 9: Drivers’ results for Suzuka in 2023 and 2024

constructorRef	driverRef	position2023	position2024
red_bull	max_verstappen	1	1
mclaren	norris	2	5
mclaren	piastri	3	8
ferrari	leclerc	4	4
mercedes	hamilton	5	9
ferrari	sainz	6	3
mercedes	russell	7	7
aston_martin	alonso	8	6
alpine	ocon	9	15
alpine	gasly	10	16
rb	lawson	11	NA
rb	tsunoda	12	10
sauber	zhou	13	NA
haas	hulkenberg	14	11
haas	kevin_magnussen	15	13
aston_martin	stroll	NA	12
rb	ricciardo	NA	NA
red_bull	perez	NA	2
sauber	bottas	NA	14
williams	albon	NA	NA
williams	sargeant	NA	17

Don’t certain tracks favour certain cars (e.g. due to having fewer slow corners)?

This is a valid objection. We can only really say that this reflects car performance at tracks like Suzuka. Suzuka is a high-speed, high-downforce track with more high-speed than low-speed corners. You can kind of tell by looking that only corners 2, 9, 11, and 14 are slow.

Data and image from the F1 Formbook.

So perhaps another way of looking at it is that low-speed corners are where drivers make the most difference. That aligns with the excellent recent analysis by Mark Hughes where he explains how drivers like Leclerc and Hamilton make up their time in slow corners.

Naturally certain cars will also be better at slow corners, and that’s not represented at Suzuka. We might be able to gain a better understanding with sector times, but sadly I don’t have this data.

Suzuka-based priors

At this point I’m convinced enough that Suzuka is a useful indicator of relative car performance to be a prior, so let’s take the position of each team’s fastest car round Suzuka as constructor performance priors. Luckily there weren’t any double-DNFs - at least one car from each team finished - so every team is represented.

The Stan model is the same, we’re just injecting different prior mus and SDs.

Code

suzuka_positions <- suzuka2024 |>
  drop_na() |>
  group_by(constructorRef) |>
  summarise(position = min(position)) |>
  mutate(position = rank(position)) |>
  arrange(constructorRef)

data_list_suzuka <- make_data_list(ctr_positions)
data_list_suzuka$prior_mus <- suzuka_positions$position
data_list_suzuka$prior_sd <- 1

fit_suzuka <- run_stan(data_list_suzuka, "f1.b.stan")

plot_posterior(fit_suzuka)

Figure 9: Posterior samples for the Suzuka-position priors.

It’s not dramatically different. It shouldn’t be, the model and data are the same and the priors are very similar to the budget-based priors - the fun was in figuring out that Suzuka is car-limited.

Code

plot_posterior_with_prior(ctrs, data_list_suzuka$prior_mus, data_list_suzuka$prior_sd, fit_suzuka)

Figure 10: Posterior samples alongside the Suzuka-position priors.

Most teams didn’t move far from their priors. I’d draw similar conclusions to before: the McLaren was fastest, Red Bull was a close second, and Ferrari were not far behind.

Code

sp <- sample_posterior(fit_suzuka)
sp |>
  group_by(constructorRef) |>
  summarise(mu = mean(position), sigma = sd(position)) |>
  arrange(mu) |>
  kable(digits = 2)

Table 10: Mean and SD from posterior sample, with Suzuka priors.

constructorRef	mu	sigma
red_bull	2.01	0.31
mclaren	2.05	0.32
ferrari	2.42	0.32
mercedes	3.39	0.39
aston_martin	6.07	0.49
rb	7.47	0.56
haas	7.77	0.58
alpine	8.55	0.64
sauber	9.08	0.52
williams	9.39	0.44

Reintroducing driver skill

Simple model achieved, let’s push on to something more complex i.e. the originally described model. Here it is:

data {
  int<lower=1> n_ctrs;
  int<lower=1> n_drvs;
  int<lower=1> n_obs;
  array[n_ctrs] real ctr_mus;
  real<lower=0> ctr_sd;
  array[n_obs] int<lower=1, upper=20> positions;
  array[n_obs, 2] int position_indices;
}
parameters {
  vector[n_ctrs] lambda_ctrs_raw;
  vector[n_drvs] lambda_drvs_raw;
}
transformed parameters {
  vector[n_ctrs] lambda_ctrs;
  vector[n_drvs] lambda_drvs;
  
  // Apply constraints to ensure identifiability
  lambda_ctrs = log(20) * inv_logit(lambda_ctrs_raw);
  lambda_drvs = log(20) * inv_logit(lambda_drvs_raw);
}
model {
  lambda_ctrs_raw ~ normal(ctr_mus, ctr_sd);
  lambda_drvs_raw ~ std_normal();
  
  for (k in 1 : n_obs) {
    int i = position_indices[k, 1];
    int j = position_indices[k, 2];
    positions[k] ~ poisson(exp(lambda_ctrs[i] + lambda_drvs[j]));
  }
}

There is a crucial change. To ensure the model is identifiable, the raw skill parameters are constrained to [0, 1] with an inverse logistic transform. The transformed parameters are then summed and exponented to get the lambda for the Poisson distribution.

Code

run_stan2 <- function(data_list, model_file = "f1.c.stan") {
  mod <- cmdstan_model(model_file, exe_file = str_c(model_file, ".bin"))
  mod$sample(data_list, seed = 42, show_messages = FALSE, parallel_chains = 8)
}

ctrs <- sort(unique(f2024$constructorRef))
drvs <- sort(unique(f2024$driverRef))

ctr_index <- tibble(constructorRef = ctrs, ctr_idx = 1:n_distinct(ctrs))
drv_index <- tibble(driverRef = drvs, drv_idx = 1:n_distinct(drvs))

ctr_drv_positions <- f2024 |>
  select(constructorRef, driverRef, position) |>
  arrange(constructorRef, driverRef) |>
  merge(ctr_index) |>
  merge(drv_index) |>
  drop_na()

sample_posterior2 <- function(fit) {
  ctrs <- sort(unique(ctr_drv_positions$constructorRef))
  drvs <- sort(unique(ctr_drv_positions$driverRef))
  df_ctrs <- fit$draws("lambda_ctrs", format = "draws_matrix") |> as_tibble()
  colnames(df_ctrs) <- ctrs

  df_drvs <- fit$draws("lambda_drvs", format = "draws_matrix") |> as_tibble()
  colnames(df_drvs) <- drvs

  list(
    ctrs = df_ctrs |>
      pivot_longer(
        all_of(ctrs),
        names_to = "constructorRef", values_to = "performance"
      ) |>
      mutate(performance = as.double(performance)),
    drvs = df_drvs |>
      pivot_longer(
        all_of(drvs),
        names_to = "driverRef", values_to = "performance"
      ) |>
      mutate(performance = as.double(performance))
  )
}

fit.c <- run_stan2(
  list(
    n_ctrs = n_distinct(ctrs),
    n_drvs = n_distinct(drvs),
    n_obs = nrow(ctr_drv_positions),
    ctr_mus = 0.2 * (-5 + suzuka_positions$position), # shift to centre on 0
    ctr_sd = 0.67, # picked a wide SD because car performance evolved over the season
    positions = ctr_drv_positions$position,
    position_indices = ctr_drv_positions |> select(ctr_idx, drv_idx) |> as.matrix()
  )
)

Let’s first have a look at the constructor performance as determined by the model.

Code

posterior_ctrs <- sample_posterior2(fit.c)$ctrs

ggplot(posterior_ctrs) +
  aes(x = performance, fill = constructorRef) +
  # facet_wrap(~constructorRef) +
  scale_fill_manual(values = ctr_colours$colour) +
  geom_density(alpha = 0.5, colour = "black", linewidth = 0.2, show.legend = TRUE)

Figure 11: Constructor performance parameters determined by model. Lower (left) is better.

Code

posterior_ctrs |>
  group_by(constructorRef) |>
  summarise(mean_performance = mean(performance), sd = sd(performance)) |>
  arrange(mean_performance) |>
  mutate(rank = rank(mean_performance)) |>
  kable(digits = 2)

Table 11: Constructor performance means and SDs from the posterior.

constructorRef	mean_performance	sd	rank
red_bull	0.57	0.19	1
ferrari	0.70	0.21	2
mclaren	0.75	0.23	3
mercedes	0.95	0.26	4
aston_martin	1.18	0.36	5
haas	1.31	0.28	6
rb	1.34	0.34	7
alpine	1.55	0.31	8
williams	1.62	0.30	9
sauber	1.66	0.36	10

Note that the sampled parameter is now latent performance not positions, so the numbers don’t have a real-world meaning. What matters is relative performance, where a lower number is better.

The model suggests - controversially, but with appropriate uncertainty - that the Red Bull was the fastest car, followed by Ferrari and then the McLaren. However the relative positioning of the other teams seems about right.

Forgive me for not labelling all the drivers on this next plot, but there’s only so much plot fiddling I’m willing to do and it’s usually obvious which is which.

Code

posterior_drvs <- sample_posterior2(fit.c)$drvs |> #|
  merge(ctr_drvs) |>
  group_by(driverRef) |>
  mutate(x = mean(performance))

ggplot(posterior_drvs) +
  aes(x = performance, fill = driverRef, label = driverRef) +
  facet_wrap(~constructorRef) +
  scale_fill_manual(values = drv_colours$colour) +
  geom_density(alpha = 0.5, colour = "black", linewidth = 0.2, show.legend = FALSE) +
  geom_text(data = posterior_drvs, mapping = aes(x = x, y = 0.75, angle = 90), check_overlap = TRUE)

Figure 12: Driver performance parameters determined by model. Lower (left) is better.

Code

posterior_drvs |>
  group_by(driverRef) |>
  summarise(mean_performance = mean(performance), sd = sd(performance)) |>
  arrange(mean_performance) |>
  mutate(rank = rank(mean_performance)) |>
  kable(digits = 2)

Table 12: Driver performance means and SDs from the posterior.

driverRef	mean_performance	sd	rank
max_verstappen	0.55	0.20	1
leclerc	0.68	0.22	2
norris	0.71	0.24	3
russell	0.78	0.26	4
colapinto	0.81	0.31	5
hamilton	0.83	0.26	6
sainz	0.86	0.22	7
piastri	0.88	0.24	8
gasly	0.89	0.32	9
albon	0.93	0.31	10
ocon	1.03	0.31	11
hulkenberg	1.06	0.29	12
zhou	1.06	0.36	13
bottas	1.08	0.36	14
alonso	1.11	0.36	15
bearman	1.12	0.29	16
tsunoda	1.13	0.34	17
doohan	1.16	0.37	18
sargeant	1.19	0.31	19
ricciardo	1.20	0.34	20
kevin_magnussen	1.25	0.29	21
lawson	1.26	0.35	22
stroll	1.36	0.36	23
perez	1.49	0.21	24

The model suggests that the top drivers performed similarly, with Verstappen fastest. Poor Perez comes dead last, because the model has decided that the Red Bull was the fastest car.

There are some obvious issues though. Alonso is considered only the 15th best performer out of 24, which is absurd. This could be because the Suzuka prior placed Aston Martin ahead of Mercedes whereas they actually finished the season well behind.

If the informative prior is so bad, why don’t we just use uninformative priors? If we do that the model will not be willing/able to distinguish the contribution from cars and drivers. We need a better prior.

There will also be some effect from the way I’ve simply excluded DNFs rather than penalising driver-fault DNFs.

Weaknesses of this model

To an extent I’m just role-playing being Dr Phillips here, so your number one takeaway should be: prefer the F1 Metrics model or the Bell et al. (2016) model, which is a much more sophisticated Bayesian model. If you are seriously interested in estimating driver performance, please check those out.

Nonetheless, there are some obvious weaknesses in this model that it’s worth highlighting.

Use of positions rather than points

Phillips makes a good argument for a non-linear reward function, for it punishes drivers more fairly for DNFs. That would be an improvement here.

No assignment of DNF blame

I’ve just discarded all that data, which flatters the DNF-prone drivers. Though I note that the drivers near the bottom of the performance rankings in my model were more DNF-prone in 2024 anyway, so perhaps there’s a correlation between low performance and DNFs, or those drivers had some races where they wounded the car but finished anyway.

Only a single season considered

With more data for each driver - across different cars - the model would be better able to distinguish driver and car performance. Note that it would be necessary to model each constructor by year because the car performance varies so much.

This suggests that we might benefit from having multiple representations of car performance within each year, e.g. splitting each team into A and B specs according to when they brought their largest upgrade. The challenge with that (apart from the tedious business of determining the break points) is that we’d have less data for each car and also there are perhaps several step changes in performance throughout the season.

Summing it all up

What did we learn?

How to use Stan to sample the posterior for a Bayesian model in R.
Very little about F1.

Perhaps point 2 is under-selling a little, since we did learn that Suzuka is quite good for assessing relative constructor performance, and the model turned out some reasonable results for car and driver performance. There are some major caveats of couse, but it is the kind of model we could build on. An obvious future step is to look at the other car-limited tracks - hopefully some that are at a different point in the season and have more slow corners than Suzuka - and make the prior a combination of results at these tracks. If I try that out in future I’ll be sure to post an update here.