Local Polynomial Smoothing

So this is a blast from the past. NC State (at least while I was there) did something interesting for their prelim. Instead of taking another test like we had to do at the Masters level, they gave all their students a subject unrelated to their research, and had them write a lit review and do a small simulation study. My topic was local polynomial smoothing. I don’t think I did a particularly good job, but afterwards, I posted it on my NCSU website as an example of things I’d written.

DIY Metrics

In order to better understand some “advanced metrics”, I figured it’d be useful to build them from scratch. (This is also just a fun exercise in data manipulation, cleaning, etc.) For starters, let’s do something easy, namely raw plus/minus. For the code below, I’m using the free example play-by-play data set from NBAstuffer. They seem reputable, though I do have concerns about how widely-used their formatting is; one of the challenges with building a workflow is ensuring that the structure of your incoming data won’t change.

The Arrogance of "Noise"

This is a post about communication. One of the through-lines of my academic and professional career is conflict between entrenched subject matter experts (SME) and hot-shot quantitative analysts. As a young undergraduate, I followed Baseball Prospectus Fangraphs through the SABRmetric revolution. I watched Nate Silver bring data-driven prognostication to the world of political journalism which had previously (and arguably still is) dominated by punditry. In my current job, I work with experienced analysts who have often been working on the same systems for years.

Data Science and Data Generation Processes

I was talking about a curriculum for a new Data Science degree program, and the topic of experimental design came up. Design of Experiments (DOE) is classical subject area for statisticians, and the context of an applied statistics masters degree makes perfect sense, but in the context of data science, it seemed pretty out of place. I say that not because DOE isn’t important but because I think its something “data science” doesn’t often consider.

Calibration update, now with Brier Scores!

After reading my my previous post on calibration, my clever wife (who’s been doing calibration-related activities in the context of modeling and simulation) brought to my attention the concept of Brier Scores. (Alternatively, here.) This approach was originally proposed to evaluate weather forecasts (“Verification of weather forecasts has been a controversial subject for more than half a century,” so at least we’ve moved on controversial climate forecasts in this half-century.

Is Scott well-calibrated?

Yesterday, Scott Alexander posted his annual predictions review post. I always enjoy this post because it’s externalized introspection. Scott takes the time to formally look at things he thought, consider how right he was about these things, and consider how it should update his thinking moving forward. Most people don’t do this informally let alone formally! I want to respond to two things in the post, the latter of which is answering the question Scott only implies of whether he’s well-correlated or not.

What is random, really?

I wanted to talk a little bit more about how different metrics account for variation in player performance, and some various flavors of NBA plus/minus statistics provide nice examples. This is building a bit off of some concepts I discussed in Choosing the right metric for sports. Plus/Minus Metrics One approach for estimating individual player contribution to overall outcome is to look at the net points scored while an individual player was on the court.


When I was in school, I was an applied Statistician, but now folks would probably call me a “Data Scientist.” I’m empirical in my approach and generally skeptical of emminence-based decision-making. I’m a fan of Nate Silver, Douglas Adams, and Boris Diaw. I work in Defense, so the content of this blog will about things that are mostly or entirely unrelated to my day job. I’m hoping that forcing myself to write things down will help me interogate my own thought processes and biases.

Choosing the right metric for sports

I think its great that sports statistics are a big thing in popular media. It makes fans and media better informed about their team and players, and it provides an entry point for people to get interestd in statistics. That said, there seems to be a perpetual innumeracy in the way folks talk about a lot of these metrics. One thing I see come up repeatedly is the distinction between metrics that look at a player’s past performance and say, “How important was that player to the team’s success?

Analysis Philosophy

From what I understand, teaching professors (maybe all professors?) write a “Philosophy of Teaching Statement”. The goal of this documen is to describe the individual’s approach to teaching and why they use that approach. It basically says, “Here’s what I do, and here’s why I do it,” and maybe you get into a little bit of how they got there. In my view, it’s akin to a teaching world-view. It’s an opportunity for a professor to organize (and interrogate a little) their entire approach to their chosen profession.