Failure to Communicate

Bad Data Science in the Wild

Today’s example comes from a Reddit post on USMNT subreddit that shows the proportion of minutes played by US Men’s National Team (USMNT) players who participated in the January mini-camp the USMNT does every year. OP made the following plot: IDK what this actually means, but I sure know what people will think when they see it! Background The context here is that fans are generally dissatisfied with the USMNT right now, and one of the reasons is that Gregg Berhalter (the USMNT coach) doesn’t call up the right players.

Pie Charts: A Journey

As a newly-minted PhD Statistician, I was hired by a company that didn’t have a lot of native statistical expertise because they wanted to change that. As a result, I felt empowered to give lots of opinions on topics within my domain to anyone who happened to be in the room, including the head of the division. One of those opinions was that pie charts were the worst. I viewed pie charts as the scarlet letter of bad analysis: Having one in your analysis should get you shamed and shunned.

Nonlinearity

This is an update to my Analysis Philosphy page, which is still working towards completion Nonlinearity is a commonly-misunderstood problem when it comes to data analysis, mostly because our profession has once again managed to find a way to use a simple-sounding term in a way that’s counterintuitive to lay audiences. (See also Artificial Intelligence is Dumb.) When people think about nonlinear response variables, they think of functions that have non-linear relationships.

Distributionality

This is an update to my Analysis Philosphy page, which is still working towards completion I only get 1,750 hits on Google when I search for “Distributionality”, so maybe I should clarify what I mean, though I don’t think it’s anything profound. That data follow distributions is a tautology. When this doesn’t appear the case, it means we’ve failed to properly model hte data generation function. The most typical failure mode is to assume that the distribution is simpler than it is.

"Artificial Intelligence" is dumb

I mean the term of art, not the concept or field of study. And what’s dumb is how it’s applied. “Machine learning” is also dumb in a similar way. Some definitions for AI You can go back to the beginning if the field if you want to, but modern definitions tend to to be vague. There are good definitions out there, but these sound esoteric and unless you’re really interested in defining AI precisely, you’ll probably just stick with Merriam-Webster or Wikipedia, which means literally:

The Arrogance of "Noise"

This is a post about communication. One of the through-lines of my academic and professional career is conflict between entrenched subject matter experts (SME) and hot-shot quantitative analysts. As a young undergraduate, I followed Baseball Prospectus Fangraphs through the SABRmetric revolution. I watched Nate Silver bring data-driven prognostication to the world of political journalism which had previously (and arguably still is) dominated by punditry. In my current job, I work with experienced analysts who have often been working on the same systems for years.

Projected records and rankings aren't equivalent

Nate Duncan’s “Dunc’d On” is probably my favorite NBA podcast. He and frequent co-host Danny Leroux are analytical and comprehensive, covering the whole league. About every other week, they’ll go through ever team in a conference (East or West) and talk about how each team is doing, where they’re projected to finish, etc. They call these episodes “15 in 60”, although they don’t always get to all 15 teams in the conference, and I don’t think they’ve ever done one of these in 60 minutes.