Analysing Money’s Effect on Football Using Multilevel Regression

Note

An earlier version of this post included a couple errors and insufficient detail in the section translating the model’s outputs into real-world findings. While I’ve fixed these issues, I won’t be as bold as to claim there are now zero errors in this post. There are fewer than before, though.

Setup Code (Click to Expand)

# import packages
suppressPackageStartupMessages({
  library(dplyr)
  library(lme4)
  library(gt)
  library(ggplot2)
  library(marginaleffects)
})

# setup fonts
sysfonts::font_add_google("Poppins")
sysfonts::font_add_google("Lora")
showtext::showtext_auto()

# set plot theme
# inspired by https://github.com/z3tt/TidyTuesday/blob/main/R/2020_31_PalmerPenguins.Rmd
theme_set(theme_minimal(base_size = 20, base_family = "Poppins")) +
  theme_update(
    panel.grid.major = element_line(color = "grey90", linewidth = .4),
    panel.grid.minor = element_blank(),
    panel.spacing.x = unit(.65, units = "cm"),
    panel.spacing.y = unit(.3, units = "cm"),
    axis.title.x = element_text(
      color = "grey30",
      margin = margin(t = 5),
      size = rel(1.05)
    ),
    axis.title.y = element_text(
      color = "grey30",
      margin = margin(r = 5),
      size = rel(1.05)
    ),
    axis.text = element_text(color = "grey50", size = rel(1)),
    axis.text.x = element_text(angle = 30, vjust = 1, hjust = .75),
    axis.ticks = element_line(color = "grey90", linewidth = .4),
    axis.ticks.length = unit(.2, "lines"),
    legend.position = "top",
    legend.title = element_blank(),
    legend.text = element_text(size = rel(.9)),
    legend.box.margin = margin(0, 0, -10, 0),
    legend.key.width = unit(1, units = "cm"),
    plot.title = element_text(
      hjust = 0,
      color = "black",
      family = "Lora",
      size = rel(1.5),
      margin = margin(t = 5, b = 5)
    ),
    plot.subtitle = element_text(
      hjust = 0,
      color = "grey30",
      family = "Lora",
      lineheight = 0.5,
      size = rel(1.1),
      margin = margin(5, 0, 5, 0)
    ),
    plot.title.position = "plot",
    plot.caption = element_text(
      color = "grey50",
      size = rel(0.8),
      hjust = 1,
      margin = margin(10, 0, 0, 0)
    ),
    plot.caption.position = "plot",
    plot.margin = margin(rep(10, 4)),
    strip.text = element_text(size = rel(1), margin = margin(0, 0, 5, 0)),
    strip.clip = "off"
  )

# set table theme
tbl_theme <-
  function(data, width = 100, alignment = "center") {
    data |>
      tab_source_note(
        source_note = "Source: FBref & Transfermarkt Via {worldfootballR}"
      ) |>
      tab_options(
        footnotes.marks = "standard",
        footnotes.spec_ref = "^xb",
        footnotes.spec_ftr = "(x)",
        table.width = pct(width),
        table.align = alignment,
        table.font.names = "Poppins"
      ) |>
      tab_style(
        style = cell_text(align = "left"),
        locations = list(cells_source_notes(), cells_footnotes())
      )
  }

# load data
club_resources <-
  readr::read_rds(
    here::here(
      "blog",
      "2024-10-09-analysing-money-in-football",
      "data",
      "club_resources.rds"
    )
  )

Building a brand is essential in many professional fields. When we finally decide to gig economise all the jobs in an attempt to achieve the goal of implementing a perfectly awful capitalist hellscape, the first to go will be the folks that were too busy “being with their family”, “seeing the world”, or “living a rich and fulfilling life” to build their brand. While some people are blessed with names like Adam L. Ozer ¹, those of us who were given names so generic that we are never more than ten metres away from a namesake have to work harder to distinguish ourselves. I could wear a hat, but that is a lot of upkeep. I’d have to buy the hat, then I’d have to wear it, and I assume I’d have to wash it sometimes, too.

No. That won’t do. Building a brand around One Neat Trick is much easier. My One Neat Trick is multilevel models. I recommend them for most problems, usually long before I’m sure they are necessary. I assume that most problems will eventually reveal some multilevel data that justifies using a multilevel model. Once you’ve convinced yourself that this particular hammer does turn everything into a nail, it’s hard to roll that back. I’m fortunate that, in this case, I’m not far from the truth. It’s good that I didn’t pick Tobit regression².

So when I was inspired to take a deeper look at the effect that financial resources have on outcomes in football because a tweet had me mad on the Internet³, I assumed it was probably going to be a multilevel model. I’m interested in understanding how much money impacts football and whether it all comes down to spending your way to the top. I have leveraged Transfermarkt’s squad values as a proxy for a club’s financial powers and used outcomes in Europe’s Big Five leagues from 2012/13 to 2023/24 to estimate the effects of resources on league performance. The results suggest that having money is good and that teams with lots of money tend to be better at football (breathtaking insight); there’s some relatively interesting nuance in there, too, so this blog post wasn’t a complete waste of time.

I’m still unsure if this blog post is more about multilevel regression or the money in football, but the beauty of running your own silly little blog is that you can write posts that meander aimlessly. If someone has the misfortune to read this, that’s on them.

Multilevel Models for Multilevel Problems

Multilevel models address problems caused by clustered (or multilevel) data in standard linear (and generalised linear) models, namely, the violation of the assumption of independence.

The independence assumption is a central tenet of regression modelling that states that all residuals in a model should be independent. When data is clustered, observations within clusters will be correlated, leading to residuals that are also correlated (and therefore not independent). Clustered data effectively inflates the sample size⁴, and a model that fails to account for this will underestimate the standard errors of parameter estimates. These smaller standard errors give the appearance of certainty where certainty does not exist.

Multilevel models handle clustered data and the correlation between observations within clusters by explicitly modelling the grouping structures in the data. Modelling the grouping structures allows us to fit models at the population level while accounting for the unexplained variance among the groups (Gelman 2006), which produces more appropriate standard errors.

So, when you encounter multilevel data, you need a multilevel model!

The Prevalence of Multilevel Data

My first exposure to multilevel data came when researching political parties and party systems during my Political Science PhD. Being able to say anything meaningful and generalisable about political parties requires studying parties in many countries. It turns out every country is a special snowflake. Finding the golden nuggets of generalisable insight requires combining the valuable information across countries while acknowledging what makes countries and parties unique. Depending on the nature of the question being studied, many other ways exist to group political party data, including time, region, and party family.

Multilevel data can come in various forms, but whenever you find clustering or groups in your data, this is a sign you’re working with multilevel data. If you think that your data could be organised into groups for which observations within-group will be more similar to each other than they are to the rest of the observations, you should be thinking about the problems that arise from clustered data and how you might account for the grouping structures in your data. That doesn’t necessarily have to mean a multilevel model, but multilevel models are certainly one of the best solutions.

Clustering is not just a quirk you occasionally observe in the real world. Multilevel data is ubiquitous. Many phenomena we might be interested in studying can be organised into groups. Multilevel data in the real world is so widespread that McElreath (2017) argues that our starting assumption should be that any data has grouping structures that need to be accounted for and that multilevel regression should be our default choice.

Models with Memory…

The existence of clustering in your data complicates any attempts to model an outcome of interest at the population level, making it necessary to split the population-level variance (the variance across all observations without accounting for grouping structure) into two components - within-group variance (variance across observations within the same group) and between-group variance (variance across groups). The between-group variance estimates how much groups differ from each other, on average, and tells us how much group-level factors influence the outcome. In contrast, the within-group variance, the remainder of the population-level variance after group differences have been accounted for, estimates how much observations differ in a given group, telling us the population-level differences not explained by the group that observation belongs to.

I think the most intuitive way to understand how multilevel models work, at a conceptual level, is that put forward by Richard McElreath (2023) in the multilevel models chapter of his Statistical Rethinking lectures⁵ (embedded below). While the focus is on Bayesian methods, the early chapters of the video talk in general enough terms to apply to frequentist multilevel models, too.

McElreath (2023) describes multilevel models as “models within models”. You do not fit multiple models simultaneously, but it can be helpful to think about a multilevel model as bringing together the information from models of the different levels at which variance exists. The population-level model (again, a “model” from a conceptual but not technical perspective) estimates population-level effects, much like a single-level model. In contrast, group-level sub-models estimate the group-specific deviations from the population-level effects. The population model serves as a jumping-off point for each sub-model, and in this sense, the population model gives what McElreath describes as “a kind of memory” when fitting the sub-models.

…Learn Faster

When dealing with clustered data, a single-level model that controls for the grouping structure in the data leaves information on the table by treating each group as entirely independent of each other. To continue to riff off McElreath’s idea of model memory, a single-level model will forget everything it has learned whenever it switches clusters (McElreath 2017). However, clusters of the same type will have common features. For example, in a regression model that estimates how party members behave under certain conditions, party members will be clustered by the political party. While every political party will be different, there will also be inherent similarities that are very useful to factor into a model.

While single-level models leave a lot of information on the table, multilevel models retain information about other clusters by borrowing information from the overall population when estimating group-level effects in a process called “partial pooling” (more on this in the following section). This process causes multilevel models to learn “faster” and more efficiently by leveraging information from across all groups when modelling group-level differences, and it allows multilevel models to reach stable point estimates with less data because groups with smaller sample sizes can rely more heavily on the information borrowed from the population.

All this said, where no obvious clustering is observed in the data, a simpler model can be justified. Where you are not interested in quantifying the clustering effects and just want to account for them to avoid violating the independence assumption, a simpler model with robust standard errors can get you a lot of the way there.

…Resist Overfitting

Partial pooling makes multilevel models both efficient and flexible, but in addition to this, shrinking group-level estimates towards the overall population mean serves as a form of regularisation⁶, striking a balance between underfitting⁷ and overfitting⁸.

This balance is a natural consequence of partial pooling, which is itself a compromise between complete and no pooling. Complete pooling models all observations together, not accounting for group effects, and fits a single global estimate. Complete pooling leads to underfitting because the model is not complex enough to effectively model the variation in the data, with the between-group variance being an essential part of the data-generating process but being ignored. On the other hand, models with no pooling treat groups as independent of each other, fitting separate models for each group. No pooling leads to overfitting, particularly with groups with limited data, because the model doesn’t use information from other groups that might help stabilise estimates. Without that additional context, the model will more likely treat the group-level noise as signal.

The partial pooling process is an “adaptive compromise that achieves regularisation” (McElreath 2023). It balances the risks of underfitting and overfitting by pooling information across the groups, shrinking all group-level estimates towards the overall population mean. The amount of shrinkage depends on how large a group’s sample is and how much variance there is in that group. Groups with less data or high variance will shrink towards the population mean more, meaning extreme group-level estimates are less likely unless the group has a large enough sample and small enough variance to make it justifiable.

While the motivation for moving away from single-level models when dealing with clustered data is the artificially small standard errors they will estimate, the reason to fall in love with multilevel models as the solution to clustered data is partial pooling. Partial pooling, which serves as a type of memory, makes multilevel models more efficient, faster, and less vulnerable to overfitting, and the consequence of all of this is better point estimates. So, clustered data is everywhere, and when dealing with clustered data, a multilevel model will produce more realistic uncertainty estimates and more accurate point estimates. If you’re not convinced by now, you’re a heathen.

The Role of Money in Football

I’m sure even readers not interested in football know that money makes the goals go around. What is less clear is exactly how much money can impact outcomes on the pitch. To try to answer this question, I will estimate the relationship between a club’s financial resources and league outcomes, primarily focusing on total league points but also considering goal and expected goal (xG) differences. Club resources will be captured using Transfermarkt player values, summed to total squad value, as a proxy for the financial strength of a team.

Transfermarkt’s values for each player are crowdsourced by the nerds that make up the Transfermarkt community, with the goal being that these values will approximate what a player would cost on the open market (not predicting a player’s transfer fee) (Transfermarkt 2021). This process relies on the principle of the “wisdom of crowds” (Surowiecki 2005), which assumes that the crowd can work together to build an estimate of market value that is as good or better than a few experts (Müller, Simons, and Weinmann 2017). Research has shown that Transfermarkt’s player values are a strong predictor of transfer fees (Herm, Callsen-Bracker, and Kreis 2014; Müller, Simons, and Weinmann 2017; Coates and Parshakov 2022) and potentially even a reasonable proxy for player salaries (Prockl and Frick 2018), while reports suggest that these values are even used in the football industry to inform club decisionmaking (Smith 2021; James 2022). These values are not without their issues, as they tend to underestimate the value of players, and the amount of bias varies between leagues (Müller, Simons, and Weinmann 2017; Coates and Parshakov 2022), but the bias across the Big Five leagues should be relatively small.

The goal is to understand how a club’s financial resources can impact their results on the pitch. The most important mechanism is investing in the squad (though there will be other factors - coaching and support staff, facilities, etc.). Build a better team, get better results! A reliable measure of squad value (and how it changes over time) should give some indication of how the club has invested. Treating squad values as a measure of resources assumes that every team spends, more or less, as much as their finances allow. We know that’s not entirely true, but I think it is an acceptable simplifying assumption. The interest in club resources is assuming they spend that money. It’s just a little catchier to talk about money rather than the spending of that money.

Exploring Football’s Multilevel Data

Football has some immediately apparent grouping structures. The promotion/relegation system explicitly organises teams hierarchically! I’m only looking at outcomes in the top divisions, but I am also looking at leagues in five different countries, which will also be a source of some clustering in the data. Since I am looking at outcomes over 12 seasons, the teams themselves will also be a considerable source of clustering because, inevitably, certain teams will do more with their money than others and regularly outperform others in the league.

League Differences

Not accounting for league differences will undervalue the resource advantage that teams like Bayern Munich, PSG, and Juventus have in their leagues while ignoring the significant riches of the Premier League.

These league differences become more apparent when we plot the median squad market values in the Big Five leagues over time.

Plot Code (Click to Expand)

club_resources |>
  group_by(league, season) |>
  summarise(squad_value = median(squad_value)) |>
  ggplot(aes(
    forcats::as_factor(season),
    squad_value,
    group = league,
    fill = league
  )) +
  geom_col(position = "dodge", colour = "#343a40") +
  geom_hline(yintercept = 0, colour = "#343a40") +
  scale_fill_manual(
    values = c("#7AB5CC", "#026E99", "#FFA600", "#D93649", "#8C3431")
  ) +
  scale_y_continuous(
    labels = scales::label_number(
      scale_cut = scales::cut_short_scale(),
      prefix = "€"
    )
  ) +
  labs(
    title = "Squad Value in the Big Five Leagues Over Time",
    subtitle = stringr::str_wrap(
      glue::glue(
        "Median average Transfermarkt squad market values per season in each of the ",
        "Big Five leagues in Europe from 2012/13 - 2023/24."
      ),
      width = 93
    ),
    x = NULL,
    y = "Squad Value",
    caption = "Visualisation: Paul Johnson | Data: Transfermarkt Via {worldfootballR}"
  ) +
  theme(legend.key.width = unit(.8, units = "cm"))

Plenty of Premier League teams have astronomical amounts of money when compared even against the rest of the Big Five leagues despite not coming close to mixing with the very richest teams in England. It’s important to account for league differences so that those teams are being compared against their league competition, where their gigantic pot of gold is only a moderately sized pot of gold.

Squad Values

Just being rich isn’t enough. What do rich clubs do with all that money? We expect rich teams to build more valuable squads and that teams valued higher by Transfermarkt will be more successful in the league. The plot below visualises how our three outcomes (points, goal difference, and xG difference) vary by squad market value, all split by league.

Plot Code (Click to Expand)

club_resources |>
  tidyr::pivot_longer(
    cols = c(pts, xgd, gd),
    names_to = "outcome",
    values_to = "value"
  ) |>
  mutate(
    outcome = factor(
      case_when(
        outcome == "pts" ~ "League Points",
        outcome == "xgd" ~ "xG Difference",
        outcome == "gd" ~ "Goal Difference",
        .default = outcome
      ),
      levels = c("League Points", "Goal Difference", "xG Difference")
    )
  ) |>
  ggplot(aes(squad_value, value)) +
  geom_point(alpha = .4, size = .8, colour = "#343a40") +
  geom_smooth(
    method = lm,
    formula = y ~ log(x),
    colour = "#026E99",
    se = FALSE,
    linewidth = 1.2
  ) +
  facet_grid(rows = vars(outcome), cols = vars(league), scales = "free_y") +
  scale_x_continuous(
    labels = scales::label_number(
      scale_cut = scales::cut_short_scale(),
      prefix = "€"
    )
  ) +
  labs(
    title = "League Outcomes by Squad Value in the Big Five Leagues",
    subtitle = stringr::str_wrap(
      glue::glue(
        "Comparing the association between squad values and league outcomes - ",
        "points, goal difference, and xG difference - in the Big Five leagues ",
        "in Europe across the 2012/13 - 2023/24 seasons."
      ),
      width = 93
    ),
    x = "Squad Value",
    y = NULL,
    caption = "Visualisation: Paul Johnson | Data: FBref & Transfermarkt Via {worldfootballR}"
  ) +
  theme(
    panel.spacing.x = unit(.3, units = "cm"),
    axis.text.x = element_text(angle = 30, vjust = 1, hjust = .75)
  )

A regression plot visualising the association between squad market values and league outcomes. The data is subset by outcome (points, goal difference, and xG difference) and league, producing 15 plots fit. Each plot shows a positive association between squad values and outcomes. However, the association appears to be non-linear. The regression line fit to each subset of the data includes a log-transformed squad value as the predictor, which fits the data relatively well, though to varying degrees.

The value of a team’s squad positively affects all three outcomes, though that relationship is non-linear. The regression line fit to the data is done using a log-transformed squad market value, and it does a reasonably good job of capturing the apparent diminishing returns as squad market values get way out in front of the rest of the league.

It is also worth noting that while the relationship between squad value and the outcomes is similar across all five leagues, there is some variance. This variance is most apparent at the top end of the value ranges. The highest squad values are much larger in some leagues than others, as are the highest total values of each outcome (especially the league points, since these are constrained by a maximum number of points that any team could win, which varies by league).

Season Differences

Finally, we can consider how these effects have changed over time, plotting the relationship between squad values and league points, split by time, below. The darker blue points are earlier seasons in the data, and the lighter grey points are the most recent seasons⁹.

Plot Code (Click to Expand)

club_resources |>
  ggplot(aes(squad_value, pts, colour = season)) +
  geom_point(alpha = .4, size = 1) +
  geom_smooth(
    method = lm,
    formula = y ~ log(x),
    se = FALSE,
    linewidth = 1,
    alpha = .6
  ) +
  scale_colour_manual(
    values = c(
      "#026E99",
      "#24779F",
      "#3881A6",
      "#498AAC",
      "#5993B2",
      "#699CB7",
      "#79A6BD",
      "#8AAFC2",
      "#9BB8C7",
      "#ACC1CB",
      "#BFCACF",
      "#D2D2D2"
    ),
    guide = FALSE
  ) +
  scale_x_continuous(
    labels = scales::label_number(
      scale_cut = scales::cut_short_scale(),
      prefix = "€"
    )
  ) +
  labs(
    title = "League Points by Squad Value Over Time",
    subtitle = glue::glue(
      "Comparing the association between squad market values and total league ",
      "points across<br>the Big Five leagues in Europe, in each season from ",
      "<b style='color:#00537C;'>2012/13</b> - <b style='color:#D2D2D2;'>2023/24</b>."
    ),
    x = "Squad Value",
    y = "League Points",
    caption = "Visualisation: Paul Johnson | Data: FBref & Transfermarkt Via {worldfootballR}"
  ) +
  theme(
    plot.subtitle = ggtext::element_markdown(lineheight = .7)
  )

A regression plot visualising the association between squad market values and league points across the Big Five leagues in each season from 2012/13 to 2023/24. The data is split by season, with a separate regression line fit to each season's data, including a log-transformed squad value as the predictor. While the association between squad value and points is positive throughout the period, the slopes appear to flatten with time, suggesting that increases in squad value are worth fewer league points in recent seasons than earlier in the data.

There is a clear shift over time. Teams have to spend more to increase their points totals now, and the ceiling of squad values has also increased with time.

Estimating Squad Value Effects

Helper Function Code (Click to Expand)

fit_mlms <-
  function(data) {
    lmer(
      value ~
        1 +
          demean_squad +
          double_mean_club +
          time +
          (1 + demean_squad | league / squad) +
          (1 | season),
      data = data,
      REML = TRUE,
      control = lmerControl(
        optimizer = "bobyqa",
        optCtrl = list(maxfun = 200000)
      )
    )
  }

Model Code (Click to Expand)

models <-
  club_resources |>
  mutate(
    squad_value = log(squad_value),
    mean_club = mean(squad_value),
    demean_squad = squad_value - mean_club,
    .by = squad
  ) |>
  tidyr::pivot_longer(
    cols = c(pts, xgd, gd),
    names_to = "outcome",
    values_to = "value"
  ) |>
  mutate(
    double_mean_club = mean_club - mean(squad_value),
    value = value / mp,
    time = as.numeric(season) - 1,
    outcome = case_when(
      outcome == "pts" ~ "Points",
      outcome == "gd" ~ "Goal Difference",
      outcome == "xgd" ~ "xG Difference",
      .default = outcome
    )
  ) |>
  filter(!is.na(value)) |>
  tidyr::nest(.by = c(outcome)) |>
  mutate(
    model = purrr::map(data, fit_mlms),
    summary = purrr::map(model, broom.mixed::glance),
    coefs = purrr::map(model, ~ broom.mixed::tidy(.x, conf.int = TRUE)),
    preds = purrr::map(model, ~ predict(.x))
  )

The previous section highlights multiple grouping structures that need to be accounted for in the data, which is fortunate because I’ve already spent a lot of time talking about multilevel models here, and that would have been a real waste of time. This exploratory work also identified that squad values appear to have a non-linear relationship with the three outcomes. The other detail that we can see from the variation within and between groups across the range of squad values is that not only does the mean value vary by league, but also the magnitude of squad value’s effect on the three outcomes varies. Multilevel models can handle both of these types of variation quite easily. When the mean difference is allowed to vary between groups, this is a varying intercepts model, and allowing the magnitude of the effect to vary by group is called a varying slopes model. When your model allows both, it is a varying intercepts & slopes model, and this is the kind of model we will use here. We saw minor but clear differences at the league level, and we know that the most significant differences will occur at the club level (this just would have been a mess to visualise).

The regression models in Table 1 are fit to the three outcomes - league points, goal difference, and xG difference - all have the same basic structure. All three outcomes have been transformed to a “Per Game” value in order to account for the shortened season in Ligue 1 during the COVID-19 pandemic. The model includes three population-level explanatory variables: squad value, club mean value, and time. Squad market value has been decomposed into two variables - group-mean centred squad value¹⁰, using clubs as the groups, and the club mean squad value, to effectively capture the within and between effects, respectively (Bafumi and Gelman 2007; Bell, Fairbrother, and Jones 2019; Enders and Tofighi 2007). Both squad value variables have been log-transformed to account for the decreasing gains in league outcomes as squad values increase. An additional variable for time, a continuous variable indexed at 0 (2012/13) and with a maximum of 11 (2023/24), is included. The time variable has been included to account for changes in the distribution of outcomes over time (for example, increasing disparities in outcomes between the best and worst teams). The grouping structures include teams nested within their league and a separate crossed-grouping variable for each season¹¹.

While the model intercepts are allowed to vary according to each grouping structure, the squad value slopes are specified to vary by the nested league/team grouping. The varying slopes component means that the magnitude of squad values’ effect on outcomes is allowed to vary by league and team. We could have allowed the squad value effects to vary by season as well, given that there did appear to be some flattening of the curve of squad values association with league points over time, however, the differences were minor and allowing the intercepts to vary should be sufficient, without inviting unnecessary complexity. In contrast to the population-level time variable (which is intended to capture trends), the season grouping structure mostly captures the fact that outcomes are not independent of each other in a given season. There are a finite number of points in a season, so if Man City win them all, there’s none left for anyone else. Similarly, if Man City score 1000 goals, everyone else’s goal difference will be much worse.

Table Code (Click to Expand)

cm <-
  c(
    "(Intercept)" = "(Intercept)",
    "demean_squad" = "Squad Value",
    "double_mean_club" = "Club Mean Value",
    "time" = "Time (Seasons)",
    "SD (Intercept squadleague)" = "_Club_: Intercept Std. Dev.",
    "SD (demean_squad squadleague)" = "_Club_: Slope Std. Dev.",
    "SD (Intercept league)" = "_League_: Intercept Std. Dev.",
    "SD (demean_squad league)" = "_League_: Slope Std. Dev.",
    "SD (Intercept season)" = "_Season_: Intercept Std. Dev.",
    "SD (Observations)" = "Residual Std. Dev."
  )

gm <-
  list(
    list("raw" = "nobs", "clean" = "Num. Obs", "fmt" = 0),
    list("raw" = "r2.marginal", "clean" = "R<sup>2</sup> Marginal", "fmt" = 2),
    list(
      "raw" = "r2.conditional",
      "clean" = "R<sup>2</sup> Conditional",
      "fmt" = 2
    ),
    list("raw" = "icc", "clean" = "ICC", "fmt" = 2),
    list("raw" = "rmse", "clean" = "RMSE", "fmt" = 2)
  )

models |>
  pull(model, name = outcome) |>
  modelsummary::msummary(
    statistic = "conf.int",
    gof_map = gm,
    coef_map = cm,
    fmt = 2,
    output = "gt"
  ) |>
  tab_row_group(label = md("**Group-Level Effects**"), rows = 9:14) |>
  tab_row_group(label = md("**Population-Level Effects**"), rows = 1:8) |>
  tab_footnote(
    footnote = "Integers 0-11 (2012/13 = 0; 2022/23 = 11)",
    locations = cells_body(columns = 1, rows = 7)
  ) |>
  tab_footnote(
    footnote = "Log-transformed",
    locations = cells_body(columns = 1, rows = c(3, 5))
  ) |>
  tab_footnote(
    footnote = "Group mean centred",
    locations = cells_body(columns = 1, rows = 3)
  ) |>
  tab_footnote(
    footnote = "Grand mean centred",
    locations = cells_body(columns = 1, rows = 5)
  ) |>
  tab_spanner(columns = 2:4, label = "Outcomes (Per Game)") |>
  fmt_markdown(columns = 1, rows = 9:17) |>
  tab_style(
    style = cell_text(size = "small"),
    locations = cells_body(columns = 2:4, rows = c(2, 4, 6, 8))
  ) |>
  tbl_theme()

Table 1: Multilevel Regressions of Squad Values’ Effect on Season Outcomes

	Outcomes (Per Game)
	Points	Goal Difference	xG Difference
Population-Level Effects
(Intercept)	1.60	0.37	0.06
	[1.42, 1.79]	[0.08, 0.67]	[-0.19, 0.31]
Squad Value^*,†	0.39	0.64	0.42
	[0.32, 0.47]	[0.52, 0.77]	[0.25, 0.59]
Club Mean Value^*,‡	0.52	0.81	0.58
	[0.49, 0.55]	[0.77, 0.85]	[0.54, 0.63]
Time (Seasons)^§	-0.04	-0.07	-0.02
	[-0.05, -0.03]	[-0.08, -0.05]	[-0.04, -0.01]
Group-Level Effects
Club: Intercept Std. Dev.	0.09	0.15	0.16
Club: Slope Std. Dev.	0.15	0.25	0.34
League: Intercept Std. Dev.	0.20	0.32	0.25
League: Slope Std. Dev.	0.05	0.10	0.15
Season: Intercept Std. Dev.	0.04	0.07	0.03
Residual Std. Dev.	0.22	0.33	0.24

Num. Obs	1174	1174	684
R² Marginal	0.63	0.63	0.60
R² Conditional	0.82	0.84	0.85
ICC	0.52	0.57	0.62
RMSE	0.21	0.30	0.21
Source: FBref & Transfermarkt Via {worldfootballR}
(*) Log-transformed
(†) Group mean centred
(‡) Grand mean centred
(§) Integers 0-11 (2012/13 = 0; 2022/23 = 11)

Population-Level Effects

The intercept values for each model in Table 1 represent the predicted value for each outcome in the 2012/13 season¹² for a club with a squad value that is equal to their mean squad value across all seasons and with a mean squad value that is equal to the grand mean squad value. The Squad Value coefficient represents the within-club effect, while the Club Mean Value coefficient is the between-club effect. The within-club effect estimates how increases in any given club’s squad value, relative to their average squad value, impact their outcomes on the pitch. In contrast, the between-club effect estimates how much a difference in a club’s average squad value, relative to the grand mean squad value, impacts outcomes. The Club Mean Value coefficient helps us quantify how differences in the average squad value between clubs affect differences in their outcomes.

Both of these coefficients are log-transformed, which means they are not directly interpretable beyond the direction of the effect. However, we can translate both coefficients to a real-world value. Using the formula $38 \times β \times log (1.10)$ , where $β$ represents the Squad Value or Club Mean Value coefficients, we can estimate how a 10% increase in a club’s squad value¹³ (for the average club, this is an increase of ~€24m¹⁴) impacts outcomes over a 38-game season. Every 10% increase in the value of a club’s squad, the within-club effect, boosts points by an average of ~1.4 points over 38 games, while it also leads to a ~1.5 increase in xG difference and ~2.3 in goal difference. Transforming the between-club effect, on the other hand, estimates the difference in outcomes over a 38-game season between two clubs where one has a 10% higher average squad value across the dataset than the other. The model estimates that a club with a 10% greater mean average squad value over multiple seasons will win an average of ~1.9 more points in any given 38-game season, while their xG and goal differences will be ~2.1 and ~2.9 greater, respectively.

Finally, using the formula $(e^{\frac{3}{38 \times β}} - 1) \times 100$ , we can estimate the percentage increase in a club’s squad value that is required to win three additional points, on average, over a 38-game season. The model predicts an increase of ~22.4% in a club’s squad value is, on average, equivalent to an extra win.

Group-Level Effects

Group-level effects estimate the amount that group means vary around the grand mean. At a club level, intercept and slope effects effectively capture two distinct sources of over/underperformance.

Club-level intercept estimates capture variability in baseline performance between clubs. This highlights structural differences at the club level independent of squad value and persistent across time. One standard deviation around the points intercept is worth 0.09 points per game, which amounts to $\pm$ 3.42 points over a 38-game season. The largest club-specific intercepts belong to Manchester City and Liverpool, but various clubs perform well. The strongest performers include some of the world’s biggest teams, but Union Berlin also come in third, and VfL Bochum and Bastia are in the top 10. This suggests that money does not explain structural advantages (at least not entirely). It could be institutional knowledge, better facilities (training ground, medical staff, analytics), or manager effects. Perhaps teams are paying the refs off (especially when they are playing your team). Rich clubs will have more money to invest in these areas, but well-run clubs may have opportunities to make efficient gains. I imagine paying off a referee costs less than buying a world-class striker.

On the other hand, club-level slope estimates capture variability in the effect of squad value changes on performance across clubs. Positive club-specific estimates suggest a club has more effectively turned increases in the value of their squad into improvements in outcomes on the pitch¹⁵. The standard deviation around the points slope (0.15) indicates that a club could gain as much as 5.7 additional points in a 38-game season if they are smart when building their squad. Inter Milan have the largest club-specific slope, followed by Man City and Liverpool, while Stuttgart and Lens round out the top five. In this case, the strongest performers are even more mixed, indicating that it’s not about how much money a team spends but how effectively they spend it¹⁶.

Finally, league-level effects capture structural differences primarily caused by financial disparities between the Big Five leagues. League intercept effects represent variability driven by the fact that a squad of average value would be more or less competitive depending on the league. In contrast, the league slope effects capture differences in the impact that increases in squad values have between leagues. Ligue 1 has the largest positive intercept, indicating that a squad of average value would be expected to win more points in France. Serie A has the largest positive slope, indicating that increases in squad values can go further in Italy. In contrast, the Premier League has the largest negative slope and intercept because money doesn’t go as far in England’s top flight!

Predicting League Points

While it’s useful (and hopefully interesting) to understand how squad values, decomposed into within- and between-club effects, impact outcomes, and how these effects vary at a club- and league-level, I think it’s a little difficult to piece all this together conceptually and develop a complete picture of the effect that financial resources (with squad value used as a proxy) have on outcomes. This is an inevitable difficulty with complex model structures. It becomes increasingly difficult to translate the many moving parts into a simple, intuitive understanding of the effect. Computing predictions using the fitted multilevel model will tell us more. I will focus on league points because I think that’s a little more interesting than looking at goal or xG difference¹⁷.

The 'It's One Banana Michael' meme from Arrested Development, with Lucille Bluth saying 'I mean, it's one point, Michael? What could it cost? 100 million dollars?'

We can start by looking at the predicted league points across all of the leagues by squad values over the last three seasons.

Helper Function Code (Click to Expand)

overall_preds <-
  function(data) {
    predictions(
      data,
      re.form = NA,
      variables = list(
        "time" = 9:11,
        season = c("2021/22", "2022/23", "2023/24")
      )
    )
  }

Plot Code (Click to Expand)

models |>
  filter(outcome == "Points") |>
  select(model) |>
  rowwise() |>
  mutate(preds = list(overall_preds(model))) |>
  tidyr::unnest(preds) |>
  mutate(
    across(c(double_mean_club, demean_squad), ~ as.numeric(as.character(.x))),
    double_mean_club = double_mean_club + mean(log(club_resources$squad_value)),
    squad_value = exp(demean_squad + double_mean_club),
    across(c(estimate, conf.low, conf.high), ~ .x * 38)
  ) |>
  ggplot(aes(squad_value, estimate)) +
  geom_smooth(
    method = lm,
    formula = y ~ log(x),
    se = FALSE,
    linewidth = 1,
    colour = "#343a40"
  ) +
  geom_smooth(
    aes(y = conf.low),
    method = lm,
    formula = y ~ log(x),
    se = FALSE,
    linewidth = 0.8,
    colour = "#343a40",
    linetype = "dashed"
  ) +
  geom_smooth(
    aes(y = conf.high),
    method = lm,
    formula = y ~ log(x),
    se = FALSE,
    linewidth = 0.8,
    colour = "#343a40",
    linetype = "dashed"
  ) +
  scale_x_continuous(
    labels = scales::label_number(
      scale_cut = scales::cut_short_scale(),
      prefix = "€"
    )
  ) +
  labs(
    title = "Predicted Points by Squad Value Across the Big Five Leagues",
    subtitle = stringr::str_wrap(
      glue::glue(
        "Conditional adjusted predicted league points by squad market values, ",
        "across the Big Five leagues in Europe, from 2021/22 - 2023/24. Predicted ",
        "points calculated by multiplying points per game by 38 to reflect the ",
        "total points for a 38-game season."
      ),
      width = 95
    ),
    x = "Squad Value",
    y = "Predicted Points",
    caption = "Visualisation: Paul Johnson | Data: FBref & Transfermarkt Via {worldfootballR}"
  )

A regression plot visualising the relationship between squad market values and the model's predicted points across the Big Five leagues in the last three seasons (2021/22 - 2023/24). The plot includes a fitted regression line and 95% confidence intervals, showing that increases in the lowest squad values lead to exponential gains in predicted points, with squad values up to around €200m leading to predicted points increasing from close to zero up to around 50 points. While further increases in squad value lead to higher predicted points totals, it takes squad values of over €750m to reach 75 points.

The marginal gains for increases in squad value are the largest at the lower end of the values, appearing to start flattening out somewhere around the €200m point. There is a ton of value to be had in increasing squad value at the lower end of the leagues, but once a team pushes for 50+ points, further spending becomes less efficient.

We can also compare how increases in squad value increase predicted points conditional on the leagues. We will use marginal predictions, which calculate the effect of increases in squad values averaged within each league, plotted below.

Helper Function Code (Click to Expand)

league_preds <-
  function(data) {
    predictions(
      data,
      by = c("demean_squad", "double_mean_club", "league"),
    )
  }

Plot Code (Click to Expand)

models |>
  filter(outcome == "Points") |>
  select(data, model) |>
  rowwise() |>
  mutate(preds = list(league_preds(model))) |>
  tidyr::unnest(preds) |>
  mutate(
    across(c(double_mean_club, demean_squad), ~ as.numeric(as.character(.x))),
    double_mean_club = double_mean_club + mean(log(club_resources$squad_value)),
    squad_value = exp(demean_squad + double_mean_club),
    estimate = case_when(
      league %in% c("Bundesliga", "Ligue 1") ~ estimate * 34,
      league %in% c("Premier League", "La Liga", "Serie A") ~ estimate * 38
    )
  ) |>
  ggplot(aes(squad_value, estimate, colour = league)) +
  geom_smooth(
    method = lm,
    formula = y ~ log(x),
    se = FALSE,
    alpha = .8,
    linewidth = 1
  ) +
  scale_colour_manual(
    values = c("#7AB5CC", "#026E99", "#FFA600", "#D93649", "#8C3431")
  ) +
  scale_x_continuous(
    labels = scales::label_number(
      scale_cut = scales::cut_short_scale(),
      prefix = "€"
    )
  ) +
  labs(
    title = "Predicted Points by Squad Value in the Big Five Leagues",
    subtitle = stringr::str_wrap(
      glue::glue(
        "Marginal adjusted predicted league points, averaged over squad ",
        "market values in each of the Big Five leagues in Europe from 2012/13 ",
        "- 2023/24. Predicted points calculated by multiplying points per game ",
        "by the total games in each league's season."
      ),
      width = 95
    ),
    x = "Squad Value",
    y = "Predicted Points",
    caption = "Visualisation: Paul Johnson | Data: FBref & Transfermarkt Via {worldfootballR}"
  )

A regression plot visualising the relationship between squad market values and the model's predicted points in the Big Five leagues from 2012/13 - 2023/24. The plot includes fitted regression lines for each league. The five regression lines are similar in shape. However, the Premier League slope is more gradual with a smaller intercept, suggesting that Premier League teams need more expensive squads to compete and increases in squad value are worth fewer points. The Premier League line also extends further on the x-axis because the most expensive squads in the Premier League are worth significantly more than the rest of the leagues. The intercepts for the other four leagues are very similar, but the Bundesliga and Ligue 1 tail off as squad value increases due to playing fewer games in a season and, therefore, having fewer available points that the increases in squad value could gain. Finally, La Liga and Serie A have similar slopes. The endpoint of the La Liga line is further along the x-axis due to the significantly larger Real Madrid and Barcelona squad values.

The predicted points lines for each league are relatively similar (accounting for the four fewer games per season in the Bundesliga & Ligue 1). These lines represent the group mean variance around the grand mean intercept and slope (which is effectively what the previous plot shows).

However, there are some interesting details around the margins. The Premier League intercept is a little lower than the other four leagues, which is to be expected because the average value is much higher and, therefore, the base level required to win any points at all is higher. While the Premier League’s predicted points line catches up with the Bundesliga and Ligue 1, this is only due to the fewer available total points in those two leagues per season. The Premier League slope is not as steep, and once all the predicted points lines flatten out, the Premier League remains more or less parallel to the other 38-game-season leagues.

Identifying Performance Above/Below Expectations

All this is very interesting, but the real question is, “How can I use this against my enemies?” Well, we can compute the predicted points for each team and compare these predictions against their actual points totals each season. If your enemies are big dumb idiots, they should be underperforming their predicted points consistently. If all your enemies are King Curtis, maybe you’re the problem…

Performances above/below the model’s expectations for the Premier League’s “top six” from 2012/13 to 2023/24 are plotted below.

Plot Code (Click to Expand)

models |>
  filter(outcome == "Points") |>
  tidyr::unnest(c(data, preds)) |>
  mutate(
    value = value * mp,
    preds = round(preds * mp)
  ) |>
  tidyr::pivot_longer(
    cols = c(value, preds),
    names_to = "type",
    values_to = "points"
  ) |>
  mutate(
    type = case_when(
      type == "value" ~ "Total Points",
      type == "preds" ~ "Predicted Points",
      .default = type
    )
  ) |>
  filter(
    squad %in%
      c(
        "Manchester City",
        "Manchester Utd",
        "Liverpool",
        "Arsenal",
        "Chelsea",
        "Tottenham"
      )
  ) |>
  ggplot(aes(season, points, group = type, linetype = type)) +
  geom_smooth(
    method = lm,
    formula = y ~ splines::ns(x, 3),
    linewidth = 0.5,
    se = FALSE,
    colour = "#343a40"
  ) +
  geom_point(aes(fill = type), shape = 21, size = 1.2, stroke = 1) +
  guides(fill = guide_legend(override.aes = list(size = 2))) +
  scale_fill_manual(values = c("white", "#343a40")) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  scale_x_discrete(
    expand = c(0.05, 0.05),
    breaks = c("2013/14", "2015/16", "2017/18", "2019/20", "2021/22", "2023/24")
  ) +
  facet_wrap(facets = vars(squad), nrow = 3) +
  labs(
    title = "Premier League Top Six's Performances Above/Below Expectations",
    subtitle = stringr::str_wrap(
      glue::glue(
        "Comparing Arsenal, Chelsea, Liverpool, Man City, Man Utd, & Spurs's ",
        "total and predicted points in the Premier League from 2012/13 to ",
        "2023/24, conditional on squad market values per season."
      ),
      width = 95
    ),
    x = NULL,
    y = NULL,
    caption = "Visualisation: Paul Johnson | Data: FBref & Transfermarkt Via {worldfootballR}"
  )

A time-series plot visualising Arsenal, Chelsea, Liverpool, Man City, Man United, and Spurs's total and predicted points each season from 2012/13 to 2023/24, including a fitted line that visualises the trends in predicted and total points. Arsenal significantly underperformed expectations midway through the period but in recent seasons have experienced similarly significant overperformances, while Liverpool and Arsenal have consistently overperformed expectations recently. Manchester United and Tottenham have performed above expectations in earlier seasons but have otherwise met expectations. Finally, Chelsea have the largest swings, with massive overperformances in the early seasons and similarly significant underperformances in other seasons.

These results seem to align with what I would have expected, for the most part (though Chelsea’s early years are pretty chaotic). The exception is Manchester United, who have, miraculously, performed close to expectations over the period. Any model that doesn’t paint United as a gang of dumb idiots in the post-Fergie years is getting at least one thing wrong.

I think this highlights a flaw in the methodology used here, particularly around the operationalisation of “squad market values” as a proxy for the club’s financial clout. A critical assumption underpinning the use of squad values in this model is that teams are approximately smart enough to spend their money on the best players they can afford. The intelligent teams will get the better deals, and the dummies will overspend, but the spending will at least appear reasonable enough to reflect in the value of the squad. Over and underperformance can then be assumed to represent the teams that have been smart versus those that have misspent (though, in reality, it will be a function of other factors, too). In Manchester United’s case, they are not performing significantly above or below expectations based on their squad value, but this is likely due to the fact their squad building has been so poor that the Transfermarkt squad values don’t capture the amount they’ve spent on new players because the valuations are so misaligned with the fees United paid.

The Biggest Over & Underperformers

Finally, let’s sort the penny-wise from the (billion) pound foolish. Table 2 and Table 3 include the top three teams in each league regarding overperformance and underperformance, calculating each team’s total predicted points as a percentage above or below their total points, respectively.

Function Code (Click to Expand)

performance_table <-
  function(min_max) {
    models |>
      filter(outcome == "Points") |>
      tidyr::unnest(c(data, preds)) |>
      filter(n() > 3, .by = squad) |>
      mutate(
        value = value * mp,
        preds = round(preds * mp)
      ) |>
      summarise(
        mp = sum(mp),
        avg_ppg = sum(value) / mp,
        pred_ppg = sum(preds) / mp,
        pct_diff = (avg_ppg / pred_ppg) - 1,
        .by = c(squad, league)
      ) |>
      min_max(order_by = pct_diff, n = 3, by = league) |>
      select(squad, league, avg_ppg, pred_ppg, pct_diff) |>
      gt(groupname_col = "league", rowname_col = "squad") |>
      cols_label(
        avg_ppg = "Average",
        pred_ppg = "Predicted",
        pct_diff ~ "Over/Under"
      ) |>
      tab_spanner(
        label = "Points Per Game",
        columns = c(avg_ppg, pred_ppg)
      ) |>
      fmt_number(columns = c(avg_ppg, pred_ppg, pct_diff), decimals = 2) |>
      fmt_percent(columns = pct_diff) |>
      cols_align(align = "center", columns = c(avg_ppg, pred_ppg, pct_diff)) |>
      tab_style(
        style = cell_text(weight = "bold"),
        locations = cells_row_groups()
      ) |>
      tbl_theme()
  }

Overperformers
Underperformers

Table Code (Click to Expand)

performance_table(slice_max)

Table 2: The Biggest Overperformers in the Big Five Leagues

	Points Per Game		Over/Under
	Average	Predicted	Over/Under
Premier League
Manchester City	2.25	2.14	5.02%
Burnley	1.05	1.00	4.59%
Stoke City	1.20	1.15	4.18%
La Liga
Cádiz	1.04	0.88	18.80%
Girona	1.43	1.32	8.46%
Eibar	1.14	1.05	7.86%
Ligue 1
Bastia	1.19	1.06	12.94%
Guingamp	1.14	1.04	9.28%
Lens	1.52	1.43	6.39%
Bundesliga
Union Berlin	1.43	1.21	17.96%
Bayern Munich	2.38	2.32	2.64%
Augsburg	1.14	1.12	1.97%
Serie A
Chievo	1.03	0.96	7.03%
Juventus	2.21	2.12	4.14%
Lazio	1.72	1.67	3.01%
Source: FBref & Transfermarkt Via {worldfootballR}

Table Code (Click to Expand)

performance_table(slice_min)

Table 3: The Biggest Underperformers in the Big Five Leagues

	Points Per Game		Over/Under
	Average	Predicted	Over/Under
Premier League
Fulham	1.00	1.09	−8.43%
Sunderland	0.94	1.01	−7.29%
Norwich City	0.81	0.87	−7.23%
La Liga
Deportivo de La Coruña	0.93	1.01	−7.33%
Valencia	1.43	1.52	−6.20%
Almería	0.86	0.91	−5.76%
Ligue 1
Troyes	0.79	0.87	−9.09%
Toulouse	1.11	1.17	−5.15%
Metz	0.95	1.00	−4.38%
Bundesliga
Schalke	1.31	1.41	−7.11%
Hannover 96	1.02	1.10	−7.11%
Hamburger SV	1.08	1.15	−5.98%
Serie A
Palermo	0.96	1.02	−5.81%
Parma	1.04	1.09	−4.84%
Genoa	1.10	1.15	−4.37%
Source: FBref & Transfermarkt Via {worldfootballR}

The biggest overperformers across all five leagues are Cádiz (18.8%), Union Berlin (17.96%), and Bastia (12.94%). Interestingly, there is a decent mix of teams in Table 2. While the biggest overperformers have been promoted and defied the odds, Manchester City, Juventus, and Bayern Munich show there are a few different ways to beat the model!

On the other side, the biggest underperformers are a little less dramatic, with Troyes (-9.09%), Fulham (-8.43%), and Deportivo de La Coruña (-7.33%) edging out the competition to top the table. Unlike Table 2, the vast majority of the teams in Table 3 are expected to struggle in the league but end up underperforming their already low expectations according to their squad values¹⁸. The only exceptions are Valencia and Schalke, who both fell on hard times financially despite being big hitters in their respective leagues.

The best and worst of the bunch are plotted below to illustrate what the largest over and underperformances look like (though I’ve selected the two underperformers I found most interesting since it was so close and I’m a real selfish guy).

Plot Code (Click to Expand)

models |>
  filter(outcome == "Points") |>
  tidyr::unnest(c(data, preds)) |>
  mutate(
    value = value * mp,
    preds = round(preds * mp)
  ) |>
  tidyr::pivot_longer(
    cols = c(value, preds),
    names_to = "type",
    values_to = "points"
  ) |>
  mutate(
    type = case_when(
      type == "value" ~ "Total Points",
      type == "preds" ~ "Predicted Points",
      .default = type
    )
  ) |>
  filter(
    squad %in%
      c(
        "Valencia",
        "Schalke",
        "Union Berlin",
        "Cádiz"
      )
  ) |>
  ggplot(aes(season, points, group = type, linetype = type)) +
  geom_smooth(
    method = lm,
    formula = y ~ splines::ns(x, 3),
    linewidth = 0.5,
    se = FALSE,
    colour = "#343a40"
  ) +
  geom_point(aes(fill = type), shape = 21, size = 1.2, stroke = 1) +
  guides(fill = guide_legend(override.aes = list(size = 2))) +
  scale_fill_manual(values = c("white", "#343a40")) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  scale_x_discrete(
    expand = c(0.05, 0.05),
    breaks = c("2013/14", "2015/16", "2017/18", "2019/20", "2021/22", "2023/24")
  ) +
  facet_wrap(facets = vars(squad), nrow = 2, scales = "free_x") +
  labs(
    title = "Significant Over/Underperformers Across the Big Five Leagues",
    subtitle = stringr::str_wrap(
      glue::glue(
        "Comparing Cádiz, Schalke, Union Berlin, and Valencia's total and ",
        "predicted points in their respective leagues from 2012/13 to ",
        "2023/24, conditional on squad market values per season."
      ),
      width = 95
    ),
    x = NULL,
    y = NULL,
    caption = "Visualisation: Paul Johnson | Data: FBref & Transfermarkt Via {worldfootballR}"
  )

A time-series plot visualising Cádiz, Schalke, Union Berlin, and Valencia's total and predicted points each season from 2012/13 to 2023/24, including a fitted line that visualises the trends in predicted and total points. Cádiz were promoted in 2020/21 and overperformed their predicted points every season until being relegated last season, while Union Berlin have had a similar experience, being promoted in 2019/20 and massively overperforming expectations until a signficant underperformance last season (though they avoided relegation). Schalke and Valencia, on the other hand, have consistently underperformed while also seeing a significant decline in their predicted points.

Wrapping Up

Several thousand words later, we can conclude that the rich stay winning. Don’t worry, though. The evidence also shows that success isn’t all about money. It’s possible to overperform expectations, even at the top of the table, and some clubs manage this consistently. Like the rest of the world, football is a meritocracy, after all. If you’re rich and stupid, you only consistently finish above everyone else except those who are both rich and not dumb as rocks. It’s heartwarming stuff.

I started this blog post wondering whether I was writing a multilevel regression tutorial using football data as an example or an analysis of the money in football using a multilevel model. I still don’t know which it is. Maybe this is a Choose Your Own Adventure blog post for nerds? Whichever adventure you chose, I hope you enjoyed it. And if you didn’t, please don’t write mean things about me on the Internet.

In the interest of overpromising and underdelivering, I have the lofty goals of a follow-up blog post that recreates some of this and potentially builds on it further, using Bayesian methods. I may build a Bayesian multilevel model that looks at changes in squad value from season to season. It remains to be seen if it will take me two years to finish like this one.

Acknowledgments

Many thanks to Camilo Alvarez and Adam Ozer for their helpful feedback during the development of this blog post. I greatly appreciate anyone who helps me be just a little less stupid.

Preview image by Robert Anasch on Unsplash.

Support

If you enjoyed this blog post and would like to support my work, you can buy me a coffee or a beer or give me a tip as a thank you.

References

Bafumi, Joseph, and Andrew Gelman. 2007. “Fitting Multilevel Models When Predictors and Group Effects Correlate.” Available at SSRN 1010095.

Bell, Andrew, Malcolm Fairbrother, and Kelvyn Jones. 2019. “Fixed and Random Effects Models: Making an Informed Choice.” Quality & Quantity 53: 1051–74.

Coates, Dennis, and Petr Parshakov. 2022. “The Wisdom of Crowds and Transfer Market Values.” European Journal of Operational Research 301 (2): 523–34.

Enders, Craig K, and Davood Tofighi. 2007. “Centering Predictor Variables in Cross-Sectional Multilevel Models: A New Look at an Old Issue.” Psychological Methods 12 (2): 121.

Gelman, Andrew. 2006. “Multilevel (Hierarchical) Modeling: What It Can and Cannot Do.” Technometrics 48 (3): 432–35. http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf.

Herm, Steffen, Hans-Markus Callsen-Bracker, and Henning Kreis. 2014. “When the Crowd Evaluates Soccer Players’ Market Values: Accuracy and Evaluation Attributes of an Online Community.” Sport Management Review 17 (4): 484–92. https://www.sciencedirect.com/science/article/abs/pii/S144135231300096X.

James, Stuart. 2022. “How Do You Value a Player?” The Athletic. https://www.nytimes.com/athletic/3085749/2022/01/27/premier-league-how-do-you-value-a-player/.

McElreath, Richard. 2017. “Multilevel Regression as Default.” https://elevanth.org/blog/2017/08/24/multilevel-regression-as-default/.

———. 2018. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Chapman; Hall/CRC.

———. 2023. “Statistical Rethinking 2023 - 12 - Multilevel Models.” YouTube. https://youtu.be/iwVqiiXYeC4?si=YL4re2wbpJHR9Aft.

Müller, Oliver, Alexander Simons, and Markus Weinmann. 2017. “Beyond Crowd Judgments: Data-Driven Estimation of Market Value in Association Football.” European Journal of Operational Research 263 (2): 611–24. https://www.sciencedirect.com/science/article/pii/S0377221717304332.

Prockl, Franziska, and Bernd Frick. 2018. “Information Precision in Online Communities: Player Valuations on www.transfermarkt.de.” International Journal of Sport Finance 13 (4): 319–35. https://d-nb.info/124512143X/34.

Smith, Rory. 2021. “How Transfermarkt Helps Determine the Value of Soccer Players.” New York Times. https://www.nytimes.com/2021/08/12/sports/soccer/soccer-football-transfermarkt.html.

Surowiecki, James. 2005. The Wisdom of Crowds. New York, NY, USA: Random House.

Transfermarkt. 2021. “Transfermarkt Market Value Board - Market Value Definition.” Transfermarkt.com. https://www.transfermarkt.com/market-value-definition/thread/forum/357/thread_id/3433.

Footnotes

His name is Adam LOSER.↩︎
Please don’t @ me if you use Tobits all the time. ~~You’re probably incredibly dull.~~ It’s just a silly joke.↩︎
The eagle-eyed among you will notice that the tweet is over two years old. I haven’t spent the last two years so seething with rage that I cannot concentrate on building my silly little model that would prove Stefan wrong. I’ve just been kicking this idea around for a couple of years and have finally gotten around to finishing it off.↩︎
Each observation is assumed to contribute equally independent information. Clustered data points, however, will be partially dependent, which makes some of the information contributed by each point redundant. While the data may contain 100 observations, the information contributed will be equivalent to fewer independent observations.↩︎
I will borrow heavily from McElreath’s approach to explaining multilevel models from Statistical Rethinking (2018, 2023) because any attempt by me to improve it will probably be a complete mess. If you find yourself thirsty for knowledge (about statistical modelling, including multilevel models), having read this blog post, Statistical Rethinking (in either book or video form) is an excellent place to start.↩︎
Regularisation deliberately constrains model parameters to discourage the model from fitting to the noise in the data and make the model more generalisable.↩︎
Underfitting describes a situation where a model is too simple and cannot capture the data’s true underlying structure.↩︎
Overfitting, as you may have guessed, is the opposite of underfitting. It describes a situation where a model is too complex and captures the observed data structure in detail but, in the process, also captures the noise and quirks of the sample data.↩︎
I know I probably should have used a legend here, but seasons are discrete values and 12 different values in a legend is silly. All you really need to see is the shifting of the regression lines over time.↩︎
Group-mean centring takes the average value of the variable for each group at the relevant level (in this case, clubs) and subtracts this value from the population values to “centre” them around the group mean. In this instance, the heterogeneity bias is sufficiently negated by group-mean centring at the club level because this is the primary source of the variance in squad values at the group level.↩︎
Nested and crossed grouping structures are another example of the flexibility and complexity that can be specified in a multilevel model. They describe how grouping structures relate to each other when the data has three or more hierarchical levels. Grouping structures are nested when a lower-level grouping is entirely contained within another higher-level grouping, meaning that each of the lower-level groups belongs to only one of the higher-level groups. For example, the clubs in our data are entirely nested within their league. Manchester United have never broken out of their containment and run loose in the Bundesliga (and while teams can be relegated, the lower tiers are not included in this data). Grouping structures are crossed, on the other hand, when the lower-level grouping is only partially contained within the higher-level grouping. The lower-level groups can belong to any higher-level groups (and multiple groups) when crossed. Seasons are crossed with leagues and teams because both can belong to every one of the seasons (and in the case of the leagues, they do belong to each season).↩︎
I chose not to centre time because I don’t think this would make the intercept easier to interpret. Consequently, all three models’ intercepts are high because the average club’s average squad value will be high by 2012/13 standards.↩︎
It’s worth noting that this is a net increase in squad values, so what matters is how much more the incoming players are worth than those who are outgoing.↩︎
I used 10% increases in squad values over 1% because this seems more meaningful. A 1% increase for a club with an average squad value is an increase of ~€2.4m.↩︎
I have lumped manager effects in as a structural advantage on the assumption that better managers raise a club’s baseline outcomes regardless of squad value. However, manager effects may influence the club-level slope, too, as a better manager gets more out of investments in the squad. We also know that manager effects will bleed into Transfermarkt’s squad values - better managers improving players, leading to those players being given higher values - which will bias the population-level estimates (especially the within-effect). This points to a limitation of the model and use of squad values. Football is a complex, dynamic system with many moving parts, all of which can affect outcomes on the pitch. Limitations are okay.↩︎
Luck will also be a factor. A team running hot over multiple seasons will look like meaningful overperformance, and a dip in performances caused by a debilitating injury crisis will, at least partially, be attributed to mistakes made by the club’s decision-makers. While the model should, by design, handle this sort of variance well, football is highly random. Some of this randomness will inevitably bleed into some of the model’s estimates, and I suspect this will show up most at the club level.↩︎
Watch the games, spreadsheet nerds!↩︎
I also checked the numbers for the predicted xG differences and the same trend seems to apply, with the biggest overperformers being a bit of a mix and the underperformers being almost exclusively teams for whom the floor seems to have dropped out during the season.↩︎

Reuse

CC BY-SA 4.0

Citation

For attribution, please cite this work as:

Johnson, Paul. 2024. “Analysing Money’s Effect on Football Using Multilevel Regression.” October 9, 2024. https://paulrjohnson.net/blog/2024-10-09-analysing-money-in-football/.