001 : Data Storytelling

Impact of referees on NBA outcomes

There’s always a lot of talk amongst basketball fans of any team thinking the referees are biased towards their team. Houston fans don’t like Scott Foster very much, and in all fairness most fans don’t like Scott Foster very much. Tony Brothers has a bad reputation amongst NBA players. In a recent player survey, Tony Brothers, Scott Foster and Marc Davis ranked as the top three ‘worst referees’.

Looking at the post-lockout NBA (2013-2019), I’ve scrapped data from http://basketball-reference.com that takes box score data for regular season and playoff games and connects it to referee assignments for each game.

Getting the data

All the data that I’m using came from scrapping http://basketball-reference.com, except for team abbreviations, which I got from Wikipedia and made some edits to align with Basketball-Reference’s abbreviations (you can see these changes in the code). Unfortunately Basketball-Refernce doesn’t have an API, but their website is pretty easy to navigate using the functions htmltab and readfile. So here are the steps to getting the data:

1. Download team abbreviations

Like I mentioned above, team abbreviations were taken from Wikipedia and some edits were made to align with Basketball-Reference. The following htmltab function reads the html table from Wikipedia and turns the data into a tibble. I then make some edits to Charlotte, Phoenix and Brooklyn to align with Basketball-Reference’s naming conventions. I also added in two new abbreviations for teams that don’t exist anymore - the Charlotte Bobcats and the New Orleans Hornets.

team_abbs <- as.tibble(htmltab("https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Basketball_Association/National_Basketball_Association_team_abbreviations", which=1))
  
  team_abbs[team_abbs$`Abbreviation/Acronym`=='CHA',]$`Abbreviation/Acronym` = "CHO"
  team_abbs[team_abbs$`Abbreviation/Acronym`=='PHX',]$`Abbreviation/Acronym` = "PHO"
  team_abbs[team_abbs$`Abbreviation/Acronym`=='BKN',]$`Abbreviation/Acronym` = "BRK"
  team_abbs = rbind(team_abbs, c('NOH', 'New Orleans Hornets'))
  team_abbs = rbind(team_abbs, c('CHA', 'Charlotte Bobcats'))

2. Get the schedules

For this analysis I’m looking at officiating patterns from the post-lockout era (2013 to 2019). I need to access boxscore data for each game to get the official assignments for those games. Each game is located in an html file with the address of ‘YEAR+MONTH+DAY+0+HOMEABBREVIATION.html’. For example, Philiadelphia’s October 16, 2018 game in Boston would be coded as 201810160BOS.html. The full URL would be: https://www.basketball-reference.com/boxscores/201810160BOS.html.

To access the data for each of these games, I need the schedules for each season from 2013 to 2019. Basketball-Reference has a table for each month of each season. For the month of October, the url doesn’t have a month identifier and for November to June, it includes a ‘-month’ in the url. So for the 2019 Season, October’s schedule would be here: https://www.basketball-reference.com/leagues/NBA_2019_games.html and November’s would be here: https://www.basketball-reference.com/leagues/NBA_2019_games-november.html.

I set up a for loop and goes through the 2013 to 2019 seasons and downloads the schedules for each month into a single tibble. Here’s what that code looks like:

schedule <- tibble()
  
  months <- c('','-november', '-december', '-january','-february','-march','-april','-may','-june')
  years <- 2013:2019
  
  for (year in years){
    print(year)
    for (month in months){
      print(month)
      dest <- paste('https://www.basketball-reference.com/leagues/NBA_',year,'_games', month,'.html', sep='')
      
      temp <- as.tibble(htmltab(dest, which=1)) %>% 
        mutate(Month = substring(Date,6,8), Day = ifelse(str_detect(substring(Date, 10,11),','),substring(Date, 10,10),substring(Date, 10,11)), Year = substring(Date,nchar(Date)-3,nchar(Date)), 
               dateClean = mdy(paste(Month, Day, Year)))
    
      schedule <- schedule %>% bind_rows(temp)
    }
  }
  
  names(schedule)[3] <- 'Away'
  names(schedule)[4] <- 'AwayPTS'
  names(schedule)[5] <- 'Home'
  names(schedule)[6] <- 'HomePTS'
  
  schedule <- schedule %>% left_join(team_abbs, by=c("Home"="Franchise"))
  names(schedule)[13] = "HomeABB"
  schedule <- schedule %>% left_join(team_abbs, by=c("Away"="Franchise"))
  names(schedule)[14] = "AwayABB"

The last few lines just rename some columns that have ugly names from the htmltab function. The schedule tibble only has the teams’ full names and I need the abbreviations for the html code to access the boxscores. So I left join team abbreviations for both the home and away team’s full names.

I then need to build my html boxscore code. I’m calling this variable htmlCode. I need to stitch together date and home team abbreviation in the correct format to access the correct webpage (as discussed above).

schedule <- schedule %>% mutate(htmlCode = paste(year(dateClean), ifelse(nchar(month(dateClean))==1,paste(0,month(dateClean),sep=''), month(dateClean)), ifelse(nchar(day(dateClean))==1,paste(0,day(dateClean),sep=''),day(dateClean)), 0, HomeABB, sep=""))

3. Download the box score data - warning: here comes a really big for loop.

Now comes the fun part. Located in each boxscore webpage are two data tables that I want - home and away boxscores - and I can access these using the htmltab function.I also want the Pace statistic to adjust fouls per game to fouls per 100 possessions and I want to get the names of the officials. Pace and official assignments are not located in html tables so I can’t use htmltab. What I need to do is look through the html source file and find a pattern of characters that can be used to pull the data.

Official Names

Here’s the part of the for loop that will take the official assignments for the boxscore with the correct htmlCode.

    htmlfile <- read_file(paste('https://www.basketball-reference.com/boxscores/', htmlCode, '.html', sep=''))
      
      startString <- str_locate(htmlfile, 'Officials:')
      endString <- str_locate(htmlfile, 'Attendance:')
      
      tempString <-substring(htmlfile, startString, endString)
      
      firstOfficialStart <- str_locate(tempString,'r.html')
      tempString <- substring(tempString, firstOfficialStart[1,2]+3)
      firstOfficialEnd <- str_locate(tempString,'<')
      firstOfficial <- substring(tempString, 1, firstOfficialEnd[1,2]-1)[1]
      secondOfficialStart <- str_locate(tempString,'r.html')
      tempString <- substring(tempString, secondOfficialStart[1,2]+3)[1]
      secondOfficialEnd <- str_locate(tempString,'<')
      secondOfficial <- substring(tempString, 1, secondOfficialEnd[1,2]-1)[1]
      thirdOfficialStart <- str_locate(tempString,'r.html')
      tempString <- substring(tempString, thirdOfficialStart[1,2]+3)[1]
      thirdOfficialEnd <- str_locate(tempString,'<')
      thirdOfficial <- substring(tempString, 1, thirdOfficialEnd[1,2]-1)[1]

Looking at the html source code, I found that the string pattern ‘Official:’ was where the official’s names started and the string pattern: ‘Attendance:’ was what came after the last official’s name. It took a bit of trial and error to get the correct start and end points for each official, but essentially, I’m making a temporary string that takes the characters between the word ‘Official:’ and the word ‘Attendance:’. For the first official I’m looking at the source code to get where it is located and trimming a string to get the first officials name. I do the same thing for the second and third officials. Basically it’s a bit of data wrangling using the Stringr package from the Tidyverse.

Pace data

Pace data was a bit harder to locate and required a bit more trial and error, but I used the same process to find it as I did for the officials’ names. I needed to include an str_remove function sinc epace statistics greater than/equal to 100.0 are five characters long and anything less is four characters long. I always take five characters into my pace variable and use str_remove to take away a ‘<’ if it was part of the five characters I read in. A ‘<’ occurred right after the last digit of pace, so four character pace statistics would look like ‘94.6<’ compared to a five character statistic, like ‘101.3’. I then wrap it in as.numeric() to get a number.

Here’s the code:

    pace_loc <- str_locate_all(htmlfile, fixed('data-stat=\"pace'))[[1]][3,2]
      pace <- as.numeric(str_remove(substring(htmlfile,pace_loc+4, pace_loc+8),"\\<"))
Putting it all together

Here’s the entire for-loop. It starts by pulling the officials’ names, finds the pace statistics and then downloads the boxscores. I append the officals’ names to each home and visitor table along with pace and then append both to a tibble called fullSet. If 10 players played for each team in a game, the game would occupy 60 observations in the fullSet tibble (10 players x 2 teams x 3 officials).

for (htmlCode in schedule$htmlCode){
  #Get the ref data
    tryCatch({
      htmlfile <- read_file(paste('https://www.basketball-reference.com/boxscores/', htmlCode, '.html', sep=''))
      
      startString <- str_locate(htmlfile, 'Officials:')
      endString <- str_locate(htmlfile, 'Attendance:')
      
      tempString <-substring(htmlfile, startString, endString)
      
      firstOfficialStart <- str_locate(tempString,'r.html')
      tempString <- substring(tempString, firstOfficialStart[1,2]+3)
      firstOfficialEnd <- str_locate(tempString,'<')
      firstOfficial <- substring(tempString, 1, firstOfficialEnd[1,2]-1)[1]
      secondOfficialStart <- str_locate(tempString,'r.html')
      tempString <- substring(tempString, secondOfficialStart[1,2]+3)[1]
      secondOfficialEnd <- str_locate(tempString,'<')
      secondOfficial <- substring(tempString, 1, secondOfficialEnd[1,2]-1)[1]
      thirdOfficialStart <- str_locate(tempString,'r.html')
      tempString <- substring(tempString, thirdOfficialStart[1,2]+3)[1]
      thirdOfficialEnd <- str_locate(tempString,'<')
      thirdOfficial <- substring(tempString, 1, thirdOfficialEnd[1,2]-1)[1]
      
      pace_loc <- str_locate_all(htmlfile, fixed('data-stat=\"pace'))[[1]][3,2]
      pace <- as.numeric(str_remove(substring(htmlfile,pace_loc+4, pace_loc+8),"\\<"))
      
      #Now get the player data
      
      boxscore_link <- paste('https://www.basketball-reference.com/boxscores/',htmlCode,'.html',sep='')
      
      home <- htmltab(boxscore_link, which=3)
      game <- schedule[schedule$htmlCode == htmlCode,]
      home$HomeTeam = game$HomeABB
      home$Date = game$dateClean
      home$Result = ifelse(game$HomePTS > game$AwayPTS, 1,0)
      
      home <- home %>%  filter(!str_detect(Starters, 'Reserves'))  %>%
        filter(!str_detect(`Basic Box Score Stats >> MP`, 'Did Not Play')) %>% 
        select(Name = Starters, Team = HomeTeam, Date = Date, FTA = `Basic Box Score Stats >> FTA`, 
               PF = `Basic Box Score Stats >> PF`) %>% mutate(Pace = pace, firstOfficial = firstOfficial, secondOfficial = secondOfficial, 
               thirdOfficial = thirdOfficial) %>%   gather(key='OfficialNumber', value='Official',firstOfficial:thirdOfficial)
    
    
      away <- htmltab(boxscore_link, which=1)
      away$AwayTeam = game$AwayABB
      away$Date = game$dateClean
      away$Result = ifelse(game$HomePTS > game$AwayPTS, 0,1)
      
      away <- away %>%  filter(!str_detect(Starters, 'Reserves'))  %>%
        filter(!str_detect(`Basic Box Score Stats >> MP`, 'Did Not Play')) %>% 
        select(Name = Starters, Team = AwayTeam, Date = Date, FTA = `Basic Box Score Stats >> FTA`, 
               PF = `Basic Box Score Stats >> PF`) %>% 
        mutate(Pace = pace, firstOfficial = firstOfficial, secondOfficial = secondOfficial, thirdOfficial = thirdOfficial) %>%   
        gather(key='OfficialNumber', value='Official',firstOfficial:thirdOfficial)
      
      home <- home %>% mutate(OppFouls = sum(as.numeric(away$PF))/3)
      away <- away %>% mutate(OppFouls = sum(as.numeric(home$PF))/3)
      
      fullSet <- fullSet %>% bind_rows(home, away)
      closeAllConnections()}, error=function(e){cat("ERROR :", htmlCode, " ", conditionMessage(e), "\n")})
  
        }

Everything is put inside a tryCatch function to output errors without exiting the loop. For some reason, the occasional game won’t download properly (I get an XML error). I think it may have to do with page load time. Since I’m looking at seven years of games, I’m okay with losing a few observations. In the end I get about 450,000 observations, dividing by 60 means that about 7,500 games of information. That represents about 80% of all games played (assuming all playoff series go to seven games). For the purposes of this analysis I think it is sufficient.

4. Cleaning the dataset

Now that I have fullSet, I realized that I wanted the winner of each game as well. Rather than re-downloading the data (which takes forever!) I created a tibble called winnerName that duplicates the schedule for the home team and away team so I can left join it against fullSet and the Team name column. I then used mutate for create a variable called ‘Win’ that tests if the Winner of the game (included in winnerName) is equal to the Team’s name from the boxscore. If it is, they get a 1 if not a 0. When averaged over the season, I’ll have the team’s win percentage.

The next interesting variable is the PacePF variable. When I first started looking at the data I didn’t include pace and it skewed the results such that slower-paced teams had fewer fouls called than faster-paced teams. This makes sense, since there are less opportunities to call fouls if there are less possessions in a game. PacePF takes the total fouls called in a game and divides it by the number of possession in the game and then multiplies it by 100 to get the number of fouls per 100 possessions. This adjusts for pace.

I also included a variable called Opponent that I use to left_join Opponent win percentage on. Additional mutates include creating some factors to be used later and creating a nested if statement to get a Season variable so I can filter based on season later on.

schedule_result <- schedule %>% mutate(Winner = ifelse(HomePTS > AwayPTS, HomeABB, AwayABB)) %>% 
    select(dateClean, HomeABB, AwayABB, Winner)
  
  winnerName <- bind_rows(schedule_result %>% mutate(Team = HomeABB), schedule_result %>% mutate(Team = AwayABB))
  
  fullSet1 <- fullSet %>% left_join(winnerName, by=c("Date" = "dateClean", "Team" = "Team")) %>%
    mutate(Win = ifelse(Winner == Team, 1, 0)) %>% mutate(Name = as.factor(Name), Team=as.factor(Team), FTA=as.numeric(FTA), PF = as.numeric(PF), 
                                 PacePF = 100*PF/Pace,
                                 OfficialNumber=as.factor(OfficialNumber),
                                 Official=as.factor(Official)) %>% na.omit() %>%
    mutate(Season=ifelse(Date <'2013-07-01',2013,
                  ifelse(Date < '2014-07-01', 2014,
                         ifelse(Date < '2015-07-01',2015,
                                ifelse(Date <'2016-07-01', 2016,
                                       ifelse(Date < '2017-07-01', 2017,
                                              ifelse(Date < '2018-07-01', 2018,
                                                     ifelse(Date < '2019-07-01', 2019
                                                            ))))))),
           Opponent = ifelse(HomeABB==Team, AwayABB, HomeABB)) %>%
    select(-HomeABB, -AwayABB) 
  
  games_played <- fullSet1 %>% count(Team, Date, Season, sort = TRUE) %>% count(Team, Season)
  games_reffed <- fullSet1 %>% count(Team, Date, Official, Season) %>% count(Team, Season, Official)
  game_data <- fullSet1  %>% group_by(Team, Date, Season, Official, Opponent) %>% summarise(Fouls = sum(PacePF), OppFouls = sum(100*OppFouls/Pace)/n(), FoulDiff = Fouls-OppFouls, Win = mean(Win))
  
  foulsPerGame <- fullSet1 %>% group_by(Team, Official, Season) %>% summarise(Fouls=sum(PacePF), WinPct = mean(Win)) %>% left_join(games_reffed) %>% mutate(FPG = Fouls/nn)
## Joining, by = c("Team", "Official", "Season")
OfficialAvg <- foulsPerGame %>%group_by(Season, Official) %>% summarise(SeasonAvg=mean(Fouls/nn))
  TeamAvg <- foulsPerGame %>%group_by(Season, Team) %>% summarise(TeamSeasonAvg=mean(Fouls/nn), SeasonWinPct = mean(WinPct))
  
  fullSet1 <- fullSet1 %>% left_join(TeamAvg, by=c("Opponent"="Team", "Season" = "Season")) %>% 
    mutate(OpponentPCT = SeasonWinPct) %>% select(-SeasonWinPct, -TeamSeasonAvg)
## Warning: Column `Opponent`/`Team` joining character vector and factor,
  ## coercing into character vector
foulsPerGame <- fullSet1 %>% group_by(Team, Official, Season) %>% summarise(Fouls=sum(PacePF), WinPct = mean(Win), OpponentPct = mean(OpponentPCT)) %>% left_join(games_reffed) %>% mutate(FPG = Fouls/nn)
## Joining, by = c("Team", "Official", "Season")
#Need to change these to PacePF rather than PFs...
  foulsByTeam <- fullSet1 %>% group_by(Date, Team) %>% summarise(Fouls=sum(PacePF)/3, FTA = sum(FTA)/3)
  foulsByTeamRef <- fullSet1 %>% group_by(Date, Team, Official) %>% summarise(Fouls=sum(PacePF), FTA = sum(FTA))

Now we can get into some analysis.

Officials are pretty consistent year-over-year

Here’s a distribution of the fouls called per game in the NBA from 2013 to 2019. I’ve included a typical density function (this is the bell-curve looking chart) and a cumulative distribution function. Cumulative distribution functions (CDFs) can be intrepreted as follows: when a line is closer to the left, it means a greater proportion of games have fewer fouls called per 100 possessions than a line that is skewed to the right. For example in the CDF below, which shows fouls per 100 possessions over the 2013 to 2019 period, in 2018 (the purple line) 50% of games had 20 or fewer fouls per 100 possessions called.

As you can see, the distributions are pretty similar year-over-year. In the season that just ended, fouls per 100 possessions were right in line with other years (this is the pink line) and more than in 2018, where fouls per 100 possessions were the lowest compared to previous seven years (this is the purple line).

ggplot(game_data, aes(x=Fouls, color=as.factor(Season))) + 
    geom_density() + theme(legend.title = element_blank()) + xlab("Fouls per 100 possessions") +ylab("") + ggtitle('Distribution of Fouls per 100 Possessions')

ggplot(game_data, aes(x=Fouls, color=as.factor(Season))) + 
    stat_ecdf(geom='smooth') + theme(legend.title = element_blank()) + xlab("Fouls per 100 possessions") +ylab("") +
    ggtitle('Cumulative distribution of Fouls per 100 Possessions')

Some teams get called for more fouls…

Looking across all years, there are certain teams that have more fouls called per 100 possessions than others. From 2013 to 2019, there was a differential of about 4 fouls called per 100 possessions between the most penalized team, the NBA Champion Toronto Raptors (23.2 fouls per 100 possessions) and the least penalized team, the San Antonio Spurs (19.0 fouls per 100 possessions). Other notable teams near the top are the Memphis Grizzlies (23.0), Phoenix Suns (22.8) and Oklahoma City Thunder (22.7). And some big names at the bottom in addition to the Spurs are the Charlotte Hornets (19.4), Minnesota Timberwolves (20.3) and the Atlanta Hawks (20.4).

fpt <- foulsByTeam  %>% mutate(Team=as.character(Team))  %>% mutate(Team = ifelse(Team=="NOH","NOP", ifelse(Team=="CHA", "CHO", Team))) %>% group_by(Team) %>% summarise(FoulsPer100Pos = mean(Fouls)) %>% arrange(desc(FoulsPer100Pos))
  
  ## Above I combined the Charlotte Hornets and Charlotte Bobcats into one team abbreviation and the New Orleans Hornets and New Orleans Pelicans into another. 
  
  kable(fpt)
Team FoulsPer100Pos
TOR 23.19220
MEM 22.98415
PHO 22.78055
OKC 22.65043
LAC 22.53629
NYK 22.49705
BOS 22.14371
PHI 22.10638
UTA 22.03290
HOU 21.95324
WAS 21.94419
GSW 21.74085
DEN 21.70252
MIL 21.68955
SAC 21.63830
MIA 21.63177
IND 21.59965
DAL 21.27908
BRK 21.18571
POR 21.12683
LAL 21.01824
NOP 21.01345
ORL 20.94263
CLE 20.83862
DET 20.70564
CHI 20.49397
ATL 20.39566
MIN 20.27035
CHO 19.35301
SAS 19.08410

This is an interesting result. I didn’t expect to see such a wide spread after adjusting for pace. I’m going to look more at the top penalized team and the bottom penalized team.

The NBA Champion Toronto Raptors don’t have the same statistical distrbution of fouls called

Warning: I’m a Raptors fan, but I’m going to hopefully let the statistics do the talking…

Using the Kolmogorov-Smirnov test (KS-test), you can compare if two sample distributions are from the same population. I’m using the KS-test here to ask “is the difference in fouls called on the Raptors statistically different than fouls called on the rest of the league?”

So what does the KS-test say?

kstest = tibble()
  
  notTOR <- game_data %>% mutate(TorontoFlag = ifelse(Team=="TOR", "TOR", "OTHER"))
  
  for (season in unique(notTOR$Season)){
    a <- ks.test(filter(notTOR, Season==season & TorontoFlag=="OTHER")$Fouls,filter(notTOR, Season==season & TorontoFlag=="TOR")$Fouls)
    result <- tibble(Season=season, PValue = round(a$p.value,20),Result = ifelse(a$p.value<0.05,'Statistically Different', 'Statistically Similar') )
    kstest <- kstest %>% bind_rows(result)
  }
  
  kable(kstest)
Season PValue Result
2013 0.0000000 Statistically Different
2014 0.0000000 Statistically Different
2015 0.0023224 Statistically Different
2016 0.0996374 Statistically Similar
2017 0.0000000 Statistically Different
2018 0.0000000 Statistically Different
2019 0.0003129 Statistically Different

Every season was statistically different than the rest of the NBA except for 2016. This means that fouls per 100 possessions called on the Raptors would be expected to be drawn from a different population than fouls called on other teams. Graphically, this can be shown using cumulative distribtuion functions.

ggplot(notTOR, aes(x=Fouls, color = as.factor(TorontoFlag))) + stat_ecdf(geom='smooth') + 
   theme(legend.title = element_blank()) + xlab("Fouls per 100 possessions") +ylab("") +
    ggtitle('Cumulative distribution of Fouls per 100 Possessions: Toronto Raptors (2013-2019)')

And for completeness, here are the CDFs for each year from 2013 to 2019.

ggplot(notTOR, aes(x=Fouls, color = as.factor(TorontoFlag))) + stat_ecdf(geom='smooth') + 
    facet_wrap(vars(Season)) + theme(legend.title = element_blank()) + xlab("Fouls per 100 possessions") +ylab("") +
    ggtitle('Cumulative distribution of Fouls per 100 Possessions: Toronto Raptors')

What’s with the Spurs: Enter the upside-down

Since pace is adjusted for in the data, I think there must be a Popvich factor as their foul counts are low. These are the CDFs for the Spurs:

notSAS <- game_data %>% mutate(SanAntonioFlag = ifelse(Team=="SAS", "SAS", "OTHER"))
  
  ggplot(notSAS, aes(x=Fouls, color = as.factor(SanAntonioFlag))) + stat_ecdf(geom='smooth') + 
   theme(legend.title = element_blank()) + xlab("Fouls per 100 possessions") +ylab("") +
    ggtitle('Cumulative distribution of Fouls per 100 Possessions: San Antonio Spurs (2013-2019)')

From 2013 to 2019, in 75% of games the Spurs were called for about 21 or fewer fouls, compared to 50% of games for the rest of the league.

And for completeness, here are the CDFs for each year from 2013 to 2019.

ggplot(notSAS, aes(x=Fouls, color = as.factor(SanAntonioFlag))) + stat_ecdf(geom='smooth') + 
    facet_wrap(vars(Season)) + theme(legend.title = element_blank()) + xlab("Fouls per 100 possessions") +ylab("") +
    ggtitle('Cumulative distribution of Fouls per 100 Possessions: San Antonio Spurs')

And here are the KS-tests. As you’d expect, the distributions aren’t similar to the league as a whole. Every season sees the Spurs with a statistically different distribution.

kstest = tibble()
  for (season in unique(notSAS$Season)){
    a <- ks.test(filter(notSAS, Season==season & SanAntonioFlag=="OTHER")$Fouls,filter(notSAS, Season==season & SanAntonioFlag=="SAS")$Fouls)
    result <- tibble(Season=season, PValue = round(a$p.value,20),Result = ifelse(a$p.value<0.05,'Statistically Different', 'Statistically Similar') )
    kstest <- kstest %>% bind_rows(result)
  }
  
  kable(kstest)
Season PValue Result
2013 0.0000050 Statistically Different
2014 0.0000000 Statistically Different
2015 0.0067365 Statistically Different
2016 0.0000000 Statistically Different
2017 0.0000055 Statistically Different
2018 0.0000000 Statistically Different
2019 0.0000000 Statistically Different

So do certain officials call a tighter game for certain teams?

When taking into account the officiating patterns for specific refs, I’ve made three adjustments: official adjustments, team adjustments and win percentage adjustments. I’ve established above that certain teams (cough: Raptors) seem to have a disadvantage when it comes to overall fouls called independent of the referees.

To see if there are certain officiating patterns, I’ve taken the data set and looked at fouls called per 100 possessions by referee and by season.

For example, here are the top five teams that Marc Davis called the most fouls on in the 2017 to 2019 seasons.

diffPlotData <- foulsPerGame %>% arrange(desc(FPG)) %>% left_join(OfficialAvg) %>% left_join(TeamAvg) %>%
    mutate(TeamDifferential = FPG-TeamSeasonAvg, PctImpact = 1 + (WinPct - (SeasonWinPct/(OpponentPct+SeasonWinPct)))) %>% 
    mutate(RefImpact = as.numeric(100*(FPG/SeasonAvg)*(2-FPG/TeamSeasonAvg)*(1/PctImpact))) %>%
    select(-TeamDifferential) %>% 
    arrange(desc(RefImpact))
## Joining, by = c("Official", "Season")
## Joining, by = c("Team", "Season")
k<-diffPlotData %>% filter(Official=="Marc Davis" & Season==2019) %>% 
    select(Season, Team, Official, GamesOfficiated=nn, FoulsPer100Possessions = FPG) %>% 
    arrange(desc(FoulsPer100Possessions)) %>% head(5)
  
  kable(k)
Season Team Official GamesOfficiated FoulsPer100Possessions
2019 PHO Marc Davis 3 25.76730
2019 MEM Marc Davis 1 24.52026
2019 OKC Marc Davis 4 23.94846
2019 GSW Marc Davis 8 23.87804
2019 LAC Marc Davis 6 23.81115

The issue with looking purely at fouls per 100 possessions and saying: ‘wow Marc Davis really hates the Suns’ is that we’re not comparing Marc Davis feelings towards the Suns to: 1. How Marc Davis calls games against the rest of the league in that season, and, 2. How all the officials call games against the Suns in that season.

Foul adjustments

I’ve made two adjustments to the data to try to capture these issues related to fouls called in games.

The first is an ‘official’ adjustment. It takes the percent difference between the average fouls called per 100 possessions per team for that official in that season.

The second is the ‘team’ adjustment. As shown above, some teams just get more fouls called on them. Either due to style of play, pace or ‘league consipracy’ (I jest). This adjustment takes the percentage difference in the team’s average fouls called per 100 possessions by season and the average number of fouls called by the official in question to get an idea of the number of fouls called by the offical given the average reputation of the team.

So for example, in Phoenix Suns games officiated by Marc Davis in 2019, the Suns averaged 25.8 fouls per 100 possessions (this is the FPG variable, below). Across the entire season, the Suns averaged 23.7 fouls per 100 possessions (this is the TeamSeasonAvg variable, below), while Marc Davis called 21.2 fouls per 100 possessions across all of his games in 2019 (this is the SeasonAvg variable, below).

The Official Adjustment would be 25.7673/21.15367 = 1.218101 - meaning Marc Davis called about 22% more fouls on the Suns than he did on average.

The Team Adjustment would be 25.7673/23.727 = 1.085991 - meaning in games officiated by Marc Davis, the Suns had about 9% more fouls called than they did on average.

Combining the Official Adjustment x Team Adjustment gives a foul impact of 1.218101 x (1-0.085991) = 1.113355. So introducing Marc Davis to Phoenix Suns games in 2019 led to an increase in fouls per 100 possessions of about 11%.

kable(diffPlotData %>% filter(Season=="2019", Team=="PHO", Official=="Marc Davis") %>% select(Team, Official, Season, FPG, SeasonAvg, TeamSeasonAvg) %>% mutate(OfficialAdjustment = FPG/SeasonAvg, TeamAdjustment = FPG/TeamSeasonAvg) %>% mutate (FoulImpact = OfficialAdjustment*(2-TeamAdjustment)))
Team Official Season FPG SeasonAvg TeamSeasonAvg OfficialAdjustment TeamAdjustment FoulImpact
PHO Marc Davis 2019 25.7673 21.15367 23.727 1.218101 1.085991 1.113355

Refs can affect the game in ways that aren’t seen through fouls

So when I started this analysis I was going to solely focus on fouls and figure out an official’s impact on a game through fouls called. However, I realized while pulling some data together that I wouldn’t be telling the whole story. There can be other missed calls or no-calls or strange decisions that aren’t related specifically to personal fouls. I decided to add in another variable that would look at win percentages. Since teams may have terrible win percentages with specific refs.

A prime example was seen in the 2019 playoffs when the Houston Rockets had Scott Foster assigned to their game and the basketball analysts were quick to point out the Rockets terrible record with Scott Foster officiating their games in the playoffs.

I’ve tried to measure the impact of officials on win percentage by looking at: A. The team’s win percentage in the season for all games in the dataset B. The opponent’s win percentage in the season for all games in the dataset, and, C. The team’s win percentage in games officiated by the referee in question in the season

Using these three inputs, I’ve proposed a Win Percentage Impact metric that is calculated as:

(1 + (Team’s Win Pct With Specific Ref - (Team’s Overall Season Win Pct/(Opponent’s Win Percentage in the Season +Team’s Overall Season Win Pct)

Or

WinPctImpact = 1 + (C - (A/(A+B))) - using the letters above.

I’m trying to capture what would have been expected for the outcome (thats the A/(A+B)) and am subtractitng it from what the result was in those games with that official. I add one to the number to get it centered around one rather than around zero.

This may not be perfect, but it gives a scale centered around 1 where an impact greater than 1 is in favour of the team and an impact less than 1 would be considered a negative bias.

So for Phoenix and Marc Davis again, the WinPctImpact would be:

WinPctImpact = 1 + (0.3333 - (0.4731994/(0.4731994+0.495598))) = 0.8448933.

kable(diffPlotData %>% filter(Season=="2019", Team=="PHO", Official=="Marc Davis") %>% select(Team, Official, Season, RefWinPct = WinPct, SeasonWinPct, OpponentPct) %>% mutate(WinPctImpact = (1+(RefWinPct - SeasonWinPct/(SeasonWinPct+OpponentPct))) ))
Team Official Season RefWinPct SeasonWinPct OpponentPct WinPctImpact
PHO Marc Davis 2019 0 0.2081247 0.5172489 0.7130793

Putting it all together

To get an overall RefImpact score, I multiply FoulImpact by the inverse of WinPctImpact. I take the inverse so they’re both moving in the same direction (i.e., greater than one is a ‘disadvantage’ and less than one is an ‘advantage’). I then multiply this by 100, since everyone always like things being out of 100. So a perfectly unbiased ref for a specific team would have a RefImpact score of 100.

diffPlotData <- foulsPerGame %>% arrange(desc(FPG)) %>% left_join(OfficialAvg) %>% left_join(TeamAvg) %>%
    mutate(RefWinPct = WinPct) %>% mutate(OfficialAdjustment = FPG/SeasonAvg, TeamAdjustment = FPG/TeamSeasonAvg) %>% 
    mutate (FoulImpact = OfficialAdjustment*(2-TeamAdjustment)) %>% 
    mutate(WinPctImpact = (1+(RefWinPct - SeasonWinPct/(SeasonWinPct+OpponentPct)))) %>% 
    mutate(RefImpact = 100*FoulImpact*(1/WinPctImpact)) %>%
    arrange(desc(RefImpact))
## Joining, by = c("Official", "Season")
## Joining, by = c("Team", "Season")

The table below shows the largest RefImpacts for each Team:Official combination in the 2019 season (number of games officiated greater than or equal to five). The toughest Official:Team pairing is Mark Lindsay with the Portland Trailblazers. His RefImpact score is nearly 200. He calls slightly more fouls than average, but their win percentage is terrible with Mark.

  kable(diffPlotData %>% filter(Season==2019, nn>=5) %>% select(Season, Team, Official, FoulImpact, WinPctImpact, RefImpact) %>% group_by(Team) %>% top_n(1))
## Selecting by RefImpact
Season Team Official FoulImpact WinPctImpact RefImpact
2019 BOS Mike Callahan 1.0051836 0.5053707 198.9003
2019 SAC Kevin Scott 1.0194418 0.5129371 198.7459
2019 BRK Brandon Adair 1.0369219 0.5278196 196.4538
2019 MEM Tom Washington 1.0832577 0.5586404 193.9097
2019 ORL Scott Wall 0.9141238 0.5105586 179.0439
2019 PHI Derek Richardson 1.0712792 0.6024405 177.8232
2019 ATL John Goble 1.0885081 0.6150747 176.9717
2019 WAS Scott Twardoski 0.9846930 0.5718867 172.1832
2019 CHI Kevin Cutler 1.0248325 0.6084254 168.4401
2019 DAL Jonathan Sterling 0.9408535 0.5629238 167.1369
2019 TOR Brian Forte 1.0469611 0.6365599 164.4717
2019 HOU Pat Fraher 1.0698855 0.6564406 162.9828
2019 PHO Tyler Ford 1.1026866 0.6945068 158.7726
2019 DET Derrick Collins 1.0995975 0.7266369 151.3270
2019 POR Mark Lindsay 1.0067114 0.6689314 150.4955
2019 GSW John Goble 1.0138228 0.6808992 148.8947
2019 OKC John Goble 1.0221091 0.6898892 148.1555
2019 CLE Rodney Mott 0.9753442 0.6617915 147.3794
2019 DEN Zach Zarba 0.9738101 0.6608538 147.3564
2019 MIA Scott Foster 0.9830912 0.6677663 147.2208
2019 LAL Ken Mauer 0.9376757 0.6427890 145.8761
2019 NYK Sean Wright 1.0003332 0.6991680 143.0748
2019 UTA Ed Malloy 1.0288903 0.7200854 142.8845
2019 MIL Brent Barnaky 0.9296845 0.6628068 140.2648
2019 IND Marc Davis 0.9402634 0.6981466 134.6799
2019 CHO Derrick Collins 0.9337469 0.6935120 134.6403
2019 SAS Eric Dalen 0.9088094 0.6791793 133.8099
2019 MIN Gediminas Petraitis 0.9548545 0.7301576 130.7738
2019 NOP Rodney Mott 0.9575534 0.7506633 127.5610
2019 LAC John Goble 1.0776021 0.8536205 126.2390

A name you won’t see on the list is Scott Foster, which begs the question:

Does Scott Foster hate us?

I mean, if you’re a Raptors fan, no not really. Officiating Toronto’s 2019 games, Foster had a score of 95.8. This means the Raptors actually do slightly better in games Foster officiates. He calls about an even number of fouls and the Raptors win percentage is slightly higher than expected when Scott shows up. Even I’m surprised by this. People may think he’s not the best official (I tend to agree) but he seems to be equally subpar towards both teams rather than showing a bias.

Now if you’re a fan of the Grizzlies, Hawks, Rockets (surprise, surprise), 76ers or Cavs, yes, Scott hates you.

  kable(diffPlotData %>% filter(Season==2019, nn>=3, Official=="Scott Foster") %>% select(Season, Team, Official, FoulImpact, WinPctImpact, RefImpact))
Season Team Official FoulImpact WinPctImpact RefImpact
2019 MEM Scott Foster 1.0575210 0.5470768 193.30395
2019 ATL Scott Foster 1.0711483 0.5991798 178.76908
2019 MIA Scott Foster 0.9830912 0.6677663 147.22083
2019 CLE Scott Foster 0.9519794 0.6868480 138.60118
2019 PHI Scott Foster 0.9821365 0.7889502 124.48651
2019 HOU Scott Foster 1.0341822 0.8319093 124.31430
2019 OKC Scott Foster 1.0004699 0.8227727 121.59736
2019 DAL Scott Foster 0.8750520 0.7720084 113.34746
2019 LAC Scott Foster 1.0687552 0.9977861 107.11265
2019 TOR Scott Foster 1.0080289 1.0462297 96.34872
2019 BOS Scott Foster 0.9521920 1.0109618 94.18674
2019 DEN Scott Foster 0.9796784 1.0545283 92.90205
2019 WAS Scott Foster 0.9628572 1.0456791 92.07960
2019 BRK Scott Foster 0.9863848 1.0728018 91.94474
2019 IND Scott Foster 0.9243631 1.0171456 90.87815
2019 NYK Scott Foster 0.9779518 1.1254495 86.89433
2019 POR Scott Foster 0.9707246 1.1715720 82.85659
2019 ORL Scott Foster 0.8762911 1.0987563 79.75300
2019 MIL Scott Foster 0.8809798 1.2015946 73.31756
2019 SAS Scott Foster 0.8499485 1.1804840 72.00000
2019 GSW Scott Foster 1.0083915 1.4063355 71.70348
2019 LAL Scott Foster 0.9329831 1.3216338 70.59316
2019 UTA Scott Foster 0.9820196 1.4216737 69.07489
2019 SAC Scott Foster 0.9361580 1.4385647 65.07584

And if you’re the Warriors, Kings, Pacers or Lakers, you should be pretty happy when Scott Foster walks into your building.

How long has Foster hated the Rockets?

I’m writing this in jest, I don’t actually think Scott Foster actively hates the Rockets. Maybe just Chris Paul (we’ll see when he joins OKC). But here’s the breakdown of Scott Foster’s RefImpact for the Rockets over time.

  kable(diffPlotData %>% filter(Team=="HOU", Official=="Scott Foster") %>% select(Season, Team, Official, FoulImpact, WinPctImpact, RefImpact) %>% arrange(Season))
Season Team Official FoulImpact WinPctImpact RefImpact
2013 HOU Scott Foster 0.8880490 1.4312080 62.04891
2014 HOU Scott Foster 0.9603178 1.0489909 91.54682
2015 HOU Scott Foster 0.9802553 1.0450976 93.79557
2016 HOU Scott Foster 1.0383135 1.3287667 78.14115
2017 HOU Scott Foster 0.9027420 0.9846645 91.68016
2018 HOU Scott Foster 0.7311231 1.2145899 60.19506
2019 HOU Scott Foster 1.0341822 0.8319093 124.31430

So, Scott Foster’s fascination with the Rockets is really just a 2019 thing. He was actually quite pro-Houston in all years prior to 2019.

Wither Marc and Tony

The other two much maligned refs are Tony Brothers and Marc Davis. So here’s a table for 2019 showing the Ref Impacts for Tony and Marc. There are some NAs since I’ve limited the number of games officiated to greater than or equal to five.

  kable(diffPlotData %>% filter(Season==2019, nn>=3, Official=="Marc Davis" | Official == "Tony Brothers") %>% select(Season, Team, Official, RefImpact) %>% spread(key=Official, value=RefImpact))
Season Team Marc Davis Tony Brothers
2019 ATL 89.51189 NA
2019 BOS 88.91339 75.26967
2019 BRK 79.75380 NA
2019 CHI 95.39141 NA
2019 CHO 76.20787 118.11010
2019 CLE 138.62071 NA
2019 DAL 76.11391 118.66721
2019 DEN 69.98748 65.20177
2019 DET 90.73033 119.10518
2019 GSW 118.53018 66.63027
2019 HOU 122.90409 NA
2019 IND 134.67995 85.85253
2019 LAC 94.43896 134.49564
2019 LAL 72.01400 NA
2019 MEM NA 115.91210
2019 MIA 113.10720 NA
2019 MIL 107.39844 90.72186
2019 MIN 113.22994 166.79996
2019 NOP NA 104.39771
2019 NYK 104.71669 90.35938
2019 OKC 104.92639 111.94815
2019 ORL 120.97842 81.62341
2019 PHI 138.52458 95.14026
2019 PHO 156.13342 NA
2019 POR 111.59470 100.90415
2019 SAC 89.83028 105.94420
2019 SAS 82.56237 115.01777
2019 TOR 70.65162 80.83718
2019 UTA 112.71276 99.50794
2019 WAS NA 84.77816

If you’re playing Team X, which ref do you want to be there?

Let’s look at the NBA Champion Toronto Raptors. To answer this question, I’ve decided to make a heatmap. The code below provides a breakdown of how I’m making the heat map.

The adjustment I’m making takes Toronto’s RefImpact scores for each referee and divides these impacts by the RefImpact scores for each referee and each team. For example, in 2019 Marc Davis had a ref impact score for Toronto of 94.86737 (in the table above), the code takes 94.86737 and divides it by every other team’s RefImpact score for Marc Davis. Say we wanted to see a Toronto vs Golden State matchup if Marc Davis was an official in the game - we would take Toronto’s score of 94.86737 and divide it by Golden State’s RefImpact score of 139.58406. This would result in a score of 0.6796433. I multiply that by 100 to get a relative score of about 68. Any score less than 100 would favour Toronto in a matchup and anything more than 100 favours the other team. So a score of 68 favours the Raptors.

Below is a heatmap of all RefImpact scores for each team and referee relative to the Toronto Raptors. Anything white to light gray would favour the Raptors, anything in that light-medium gray would be about even. Moving from medium-gray to blue starts to slightly favour the opponent and blue to red to black becomes increasingly in favour of the opponent. Purple tiles are NAs - these are referee-opponent combinations that didn’t occur in the 2019 dataset.

TOR <- diffPlotData %>% filter(Season==2019, Team=="TOR") %>% select(Official, RefImpact)
## Adding missing grouping variables: `Team`
TORTable <- diffPlotData %>% filter(Season==2019, Team!="TOR", Official %in% TOR$Official) %>% select(Season, Team, Official, RefImpact) %>% spread(key=Official, value=RefImpact)
  
  adjustment <- TOR %>% spread(key=Official, value=RefImpact)
  adjustment <- adjustment[,-1]
  
  graphData <- tibble()
  
  for (i in 1:nrow(TORTable)){
    graphData <- bind_rows(graphData,c(TORTable[i,1:2],100*adjustment/TORTable[i,3:ncol(TORTable)]))
  }
  
  graphData <- graphData %>% gather(key="Official", value="RefImpact", -Season, -Team)
  
  breaks <- c(50,150)
  
  blups <- brewer.pal(name="BuPu", n=9)
  
  sc <- rescale(graphData$RefImpact, to = c(0,1))
  
  ggplot(graphData, aes(Team, Official)) + geom_tile(aes(fill = RefImpact)) +
    scale_fill_gradientn(colours= c('white', 'gray70', 'steelblue', 'red', 'black'), values =c(0,.1957, 0.33747, 0.47, 0.74), na.value = 'purple') #breaks in the scale bar

Say the Raptors were set to play the Warriors, the best three-official team for the Raptors would be something like Ben Taylor, Marat Kogut and Tre Maddox. If you’re the Warriors, the best three-official team would be Ron Garretson, Rodney Mott and Mike Callahan.

Conclusion

Overall, you see some interesting variations in NBA outcomes based on the referees that are assigned to the game. This is an interesting topic that has some applicability when looking into daily fantasy or game outcome predictions. This project gives a wide overview on scrapping data, exploratory data analysis, feature engineering and some modelling to understand how refereeing impacts the game.