Casual Inference

I was recently asked to share some views on sports anlytics and aesthetics based my post on How Analytics Ruins Sports. I’m not sure if anything will come of it, but I happened upon some interesting thoughts that I think are worth writing down here at a minimum. In this post, I’m going to give some short descriptions of general ideas that I hope to flesh out in future posts.

Not really, but sometimes! A few recent posts from Astral Codex Ten (ACX) make use of some stylized plots. Here’s an example: !Example ACX plot This looks like … well, it looks like it was drawn using MS Paint. And I think that in this case, it’s the perfect tool for the task! Tools matter There are lots of good, general rules for graphical design. I’ve discussed a couple ideas on here, and there are tons of resources out there.

When doing a regression analysis with categorical variables, which level is used as the reference level can be important. This is underappreciated, since most non-major classes on regression (or more precisely, regression classes that don’t show you the underlying matrix algebra) don’t talk about it. Software mostly hides this as well unless users want to dive deep into the options. Failing to consider your choice of reference level and how that choice can effect your analysis can lead you to erroneous (or at least dubious) conclusions.

Very informative slide deck discussing SARS-CoV-2 from Dr. Michael Lin.

I’m teaching a graduate-level intro stats course right now, and one thing that struck me as we move from calculating things “by hand” to doing things in R is that there’s no real reason to emphasize the normal approximation binomail confidence interval once you’re using software. Or at least far less reason. The normal approximation This is the basic interval they’ve taught in introductory statistics courses since time immamorial. Or at least the past few decades, I’d have to know the history of Stats Ed to give the real timeframe.

Today’s example comes from a Reddit post on USMNT subreddit that shows the proportion of minutes played by US Men’s National Team (USMNT) players who participated in the January mini-camp the USMNT does every year. OP made the following plot: IDK what this actually means, but I sure know what people will think when they see it! Background The context here is that fans are generally dissatisfied with the USMNT right now, and one of the reasons is that Gregg Berhalter (the USMNT coach) doesn’t call up the right players.

A friend of mine shared an abstract with me from an upcoming talk by Computer Science professor: Data analysis is an emerging research topic that focuses on understanding patterns of data to discover knowledge. For understanding the data, various machine learning (ML) techniques are commonly utilized to build learning models. For maintaining high performance of the models, it is important to extract good features and utilize them to build a reliable learning model.

library(tidyverse) library(binom) Someone had a relatively straight-forward question: They had sets of binary outcomes for different response variables, and wanted to display them all in a simple way that highlighted both the probability of success and amount of data they had for each observation. There are more than a few ways to do it, and it can be hard to determine which is best without seeing them, so let’s look at a few examples and see which we like!

Both Economics and Statistics share a peculiar failure mode: Many critical results in both rely on “large sample”/“long run average” proofs. The Central Limit Theorem is fundamental to much of classical statitics, including most (if not all) of the fundamental approaches that people are exposed to in their first few courses. The Efficient Market Hypothesis underpins much of the economic theory on which Western economies are based. Both are powerful tools for explaining common phenomena and often make complex problems simpler to understand and model.

I recently had to buy a car, and one of the trickiest things I found was figuring out how to decide between buying new, buying used (how used?) and leasing. Growing up, my parents never bought a new car. To my knoweldge, the only new car they ever bought was my dad right after he graduated college. My wife’s parents, OTOH, buy new almost exclusively. They typically own for a few years (3-5), trade it back in to the dealer and get a new vehicle.

Brief Thoughts on Sports Analytics and Aesthetics

MS Paint > ggplot?

Reference levels matter

Corona Virus information

Quick notes on binomial confidence intervals in R

Bad Data Science in the Wild

Computer Science Invents Wavelets

Plotting binary outcomes

The common failure mode of statistics and economics

A simple model for car ownership