DIY Metrics: Game Logs

Previously on DIY Metircs…

Last time in the DIY Metrics series, we had reached the point where we could extract a host of individual metrics from our data set using a function we’d named add_simple_stat_indicators:

add_simple_stat_indicators <- function(tb){
  
  tb %>% 
    mutate(
      gotblk = (description == "BLOCK"),
      gotstl = (description == "STEAL"),
      gotast = (description == "ASSIST"),
      gotreb = map_lgl(description, str_detect, "REBOUND"),
      tfoulu = map_lgl(description, str_detect, "T.FOUL"),
      tfoull = map_lgl(description, str_detect, "T.Foul"),
      fgmade = event_type == "shot",
      fgmiss = event_type == "miss",
      shotft = event_type == "free throw",
      foul = event_type == "foul",
      turnover = event_type == "turnover",
      shot3 = map_lgl(description, str_detect, "3PT"),
      made3 = map2_lgl(shot3, fgmade, function(a, b) a && b),
      miss3 = map2_lgl(shot3, fgmiss, function(a, b) a && b),
      missathing = map_lgl(description, str_detect, "MISS"),
      madeft = map2_lgl(shotft, !missathing, function(a, b) a && b),
      missft = map2_lgl(shotft, missathing, function(a, b) a && b),
      tfoul = map2_lgl(tfoulu, tfoull, function(a, b) a | b),
      pfoul = map2_lgl(foul, !tfoul , function(a, b) a && b))
  
}

This time, we’ll use the output to build player game logs!

Game Logs

Game logs are a common type data form used when looking at NBA data or data from any other source. You can find them, for example, on basketball-reference.com organized by player. Here is the game log for JJ Redick from 2017-2018, for example.

They’re useful both as ends of themselves (e.g., looking at how a player’s counting stats have changed over the course of a season) but more importantly as building blocks for more “advanced” metrics. Advanced stats basically come in three varieties. There are those based on aggregated team performance with and without individual players or combinations of players on the court (I think of these as “plus/minus-type” metrics; net rating would be an example), metrics based on linear combinations of traditional box score statistics (think PER or Effective Field goal % and similar), and then there are stats based on player position/tracking data. This last category includes things like shot quality based on location on the floor, defender location on the floor, etc. Game logs are critical mostly for the second type.

Building a game log

Generally, game logs record “box score” stats by game. These include: - Field Goals Made - Field Goals Attempted - Free Throws Made - Free Throws Attempted - 3-point Field Goals Made - 3-point Field Goals Attempted - Rebounds - Steals - Assists - Blocks - Turnovers

If you refer back to the function above, these are the same values we spent the last post counting up!

To get back to where we were last time, let’s take our raw data set and run our function to get counting statistic indicators:

tmp %>%
  filter(team == "PHI") %>% 
  unnest() %>% 
  group_by(game_id) %>% nest() %>% slice(1) %>% unnest() %>% 
  get_ast_stl_blk(pt) %>% 
    add_simple_stat_indicators() %>% 
  select(player, gotblk, gotstl, gotast, shot3, made3, miss3)
## # A tibble: 269 x 7
##    player           gotblk gotstl gotast shot3 made3 miss3
##    <chr>            <lgl>  <lgl>  <lgl>  <lgl> <lgl> <lgl>
##  1 Robert Covington TRUE   FALSE  FALSE  FALSE FALSE FALSE
##  2 Ben Simmons      TRUE   FALSE  FALSE  FALSE FALSE FALSE
##  3 Markelle Fultz   TRUE   FALSE  FALSE  FALSE FALSE FALSE
##  4 Dario Saric      TRUE   FALSE  FALSE  FALSE FALSE FALSE
##  5 Joel Embiid      TRUE   FALSE  FALSE  FALSE FALSE FALSE
##  6 Robert Covington TRUE   FALSE  FALSE  FALSE FALSE FALSE
##  7 Ben Simmons      FALSE  TRUE   FALSE  FALSE FALSE FALSE
##  8 JJ Redick        FALSE  TRUE   FALSE  FALSE FALSE FALSE
##  9 Robert Covington FALSE  TRUE   FALSE  FALSE FALSE FALSE
## 10 Ben Simmons      FALSE  TRUE   FALSE  FALSE FALSE FALSE
## # ... with 259 more rows

We can see from the above output that the function we concluded with last time is creating the indicator files we wanted. (Also note that the above code has only taken a single game’s worth of data. This makes things a bit easier to work with for now.)

The next thing we’ve got to do is aggregate these values to get our game log. We do this with the function below, which takes simple sums over the relevant statistical categories:

make_simple_stats_game_log <- function(tb){
  
  tb %>% 
    filter(!is.na(player)) %>% 
    group_by(game_id, player, playoffs, date) %>%
    summarise(
      FGM = sum(fgmade, na.rm = T),
      FGA = sum(fgmade, na.rm = T) + sum(fgmiss, na.rm = T),
      FTM = sum(madeft, na.rm = T),
      FTA = sum(madeft, na.rm = T) + sum(missft, na.rm = T),
      `3PM` = sum(made3, na.rm = T),
      `3PA` = sum(made3, na.rm = T) + sum(miss3, na.rm = T),
      REB = sum(gotreb, na.rm = T),
      STL = sum(gotstl, na.rm = T),
      AST = sum(gotast, na.rm = T),
      BLK = sum(gotblk, na.rm = T),
      TO = sum(turnover, na.rm = T)
    )
}

We can wrap all of these functions up into a single one to make our code easier to read:

get_simple_game_log <- function(tb, tn){
  
  tb %>% 
    mutate(team = tn) %>% 
    get_ast_stl_blk(tn) %>% 
    add_simple_stat_indicators() %>% 
    make_simple_stats_game_log() %>% ungroup
  
}

And now, it becomes a simple matter to generate game logs for the whole season:

simplestats <- tmp %>% 
  mutate(`Game Log (Simple stats)` =  
           map2(`team events`, pt, get_simple_game_log),
         `Regular Season per game (Simple stats)` = 
           map2(`Game Log (Simple stats)`, pt, make_simple_stats_pergame))

write_rds(simplestats, "clean-data/simple-stats-1718.rds")

Running the code above actually takes forever, so we won’t do it here. But we can look at the results from a single game:

tmp %>% 
  filter(team == "PHI") %>% unnest() %>% 
  group_by(game_id) %>% nest() %>% slice(1) %>% unnest() %>% 
  get_ast_stl_blk(pt) %>% 
  add_simple_stat_indicators() %>% 
  make_simple_stats_game_log() %>% ungroup %>% select(-game_id, -playoffs)
## # A tibble: 10 x 13
##    player date    FGM   FGA   FTM   FTA `3PM` `3PA`   REB   STL   AST   BLK
##    <chr>  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1 Amir ~ 2017~     2     7     1     2     0     0     5     0     1     0
##  2 Ben S~ 2017~     7    15     4     6     0     0    10     2     5     1
##  3 Dario~ 2017~     1     5     1     1     0     2     3     0     2     1
##  4 Jerry~ 2017~     5    10     0     0     3     7     3     1     3     0
##  5 JJ Re~ 2017~     4    10     0     0     4     8     2     1     4     0
##  6 Joel ~ 2017~     7    15     4     4     0     4    13     0     3     1
##  7 Marke~ 2017~     5     9     0     2     0     0     3     0     1     1
##  8 Rober~ 2017~     9    15     4     4     7    11     7     1     1     2
##  9 T.J. ~ 2017~     1     2     0     0     0     0     0     0     0     0
## 10 Timot~ 2017~     2     6     0     0     1     3     2     0     0     0
## # ... with 1 more variable: TO <int>

So there we have it! Game logs with simple counting statistics generated from play-by-play data. Not too shabby!