This is an update to my Analysis Philosphy page, which is still working towards completion I only get 1,750 hits on Google when I search for “Distributionality”, so maybe I should clarify what I mean, though I don’t think it’s anything profound. That data follow distributions is a tautology. When this doesn’t appear the case, it means we’ve failed to properly model hte data generation function. The most typical failure mode is to assume that the distribution is simpler than it is.
Last time on DIY Metrics, we calculated five-man-unit plus/minus ratings from scratch. If we want to use measures like this to compare performance for these groups of players, its important to consider how much game time we have for each unit. There’s a relevant discussion to be had about whether “number of posessions” or “elapsed time” is the best way to compare these groups, (IMO, it depends on what specific question you’re trying to answer with your metric) but today we’ll avoid that discussion and normalize over time because it’s easier.
I mean the term of art, not the concept or field of study. And what’s dumb is how it’s applied. “Machine learning” is also dumb in a similar way. Some definitions for AI You can go back to the beginning if the field if you want to, but modern definitions tend to to be vague. There are good definitions out there, but these sound esoteric and unless you’re really interested in defining AI precisely, you’ll probably just stick with Merriam-Webster or Wikipedia, which means literally:
Last week, I described how to build a plus/minus score for individual players based on data from NBAstuffer. I enjoyed walking through that process, so lets continue the series and expand our focus. Five-man units vs. Individual Players One of the first things I talked about on this site was comparing different metrics and choosing the right one for the task at hand. Plus/minus for individual players is a weird metric, because it’s taking a team outcome (net change in score) and applying it at an individual level.
So this is a blast from the past. NC State (at least while I was there) did something interesting for their prelim. Instead of taking another test like we had to do at the Masters level, they gave all their students a subject unrelated to their research, and had them write a lit review and do a small simulation study. My topic was local polynomial smoothing. I don’t think I did a particularly good job, but afterwards, I posted it on my NCSU website as an example of things I’d written.
In order to better understand some “advanced metrics”, I figured it’d be useful to build them from scratch. (This is also just a fun exercise in data manipulation, cleaning, etc.) For starters, let’s do something easy, namely raw plus/minus. For the code below, I’m using the free example play-by-play data set from NBAstuffer. They seem reputable, though I do have concerns about how widely-used their formatting is; one of the challenges with building a workflow is ensuring that the structure of your incoming data won’t change.
This is a post about communication. One of the through-lines of my academic and professional career is conflict between entrenched subject matter experts (SME) and hot-shot quantitative analysts. As a young undergraduate, I followed Baseball Prospectus Fangraphs through the SABRmetric revolution. I watched Nate Silver bring data-driven prognostication to the world of political journalism which had previously (and arguably still is) dominated by punditry. In my current job, I work with experienced analysts who have often been working on the same systems for years.
I was talking about a curriculum for a new Data Science degree program, and the topic of experimental design came up. Design of Experiments (DOE) is classical subject area for statisticians, and the context of an applied statistics masters degree makes perfect sense, but in the context of data science, it seemed pretty out of place. I say that not because DOE isn’t important but because I think its something “data science” doesn’t often consider.
After reading my my previous post on calibration, my clever wife (who’s been doing calibration-related activities in the context of modeling and simulation) brought to my attention the concept of Brier Scores. (Alternatively, here.) This approach was originally proposed to evaluate weather forecasts (“Verification of weather forecasts has been a controversial subject for more than half a century,” so at least we’ve moved on controversial climate forecasts in this half-century.
Yesterday, Scott Alexander posted his annual predictions review post. I always enjoy this post because it’s externalized introspection. Scott takes the time to formally look at things he thought, consider how right he was about these things, and consider how it should update his thinking moving forward. Most people don’t do this informally let alone formally! I want to respond to two things in the post, the latter of which is answering the question Scott only implies of whether he’s well-correlated or not.