# Quick notes on binomial confidence intervals in R

I’m teaching a graduate-level intro stats course right now, and one thing that struck me as we move from calculating things “by hand” to doing things in `R`

is that there’s no real reason to emphasize the normal approximation binomail confidence interval once you’re using software. Or at least far less reason.

## The normal approximation

This is the basic interval they’ve taught in introductory statistics courses since time immamorial. Or at least the past few decades, I’d have to know the history of Stats Ed to give the real timeframe. Anyway, this confidence interval uses the fact from the Central Limit Theorem, that, as \(n \rightarrow \infty\), the sampling distribution for \(\hat\pi = x/n\) closely resembles a Normal distribution.

Based on that, you get the equation:

\[\hat\pi \pm z_{\frac{\alpha}{2}} \sqrt{\frac{\hat\pi (1 - \hat\pi)}{n}}\]

### Analog CI

We can build this CI in R pretty easily by inputting the values for the sample size, \(n\), and the number of “successes” or “1”s from our binary response variable. One example from class discusses a poll of 2500 people with 400 responding “Satisfactory”. For a 90% confidence interval, we have:

```
n <- 2500
x <- 400
pihat <- x/n
alpha <- 0.1 # 90% CI --> alpha = 1 - .9 = 0.1
lower_bound <- pihat + qnorm(alpha/2) * sqrt((pihat * (1 - pihat)/n))
upper_bound <- pihat + qnorm(1 - alpha/2) * sqrt((pihat * (1 - pihat)/n))
c(lower_bound, upper_bound)
```

`## [1] 0.1479397 0.1720603`

### Easy mode

But it’s much easier to just use the `binom`

library, which contains the function `binom.confint()`

:

```
# install.packages("binom")
library(binom)
binom.confint(x = 400, n = 2500, conf.level = 0.9, method = "asymptotic")
```

```
## method x n mean lower upper
## 1 asymptotic 400 2500 0.16 0.1479397 0.1720603
```

Much easier! But now that we’re using `binom.confint()`

, we discover that we have to specify `method = "asymptotic"`

. But that implies that there are alternatives! And indeed, if we just remove that statement, we see that there are almost a DOZEN different methods that `binom.confint()`

will compute for you!

## Other types of binomial confidence intervals

First off, most of these aren’t useful in most cases. They’re in there because (1) they’re not very hard to program, so the authors figured, “Why not?” and (2) in most cases, there *is* at least one circumstance where each one is the best option. (Or they’re included for historical reasons.)

### Exact CIs, aka Clopper-Pearson

For one simple example, recall the assumption that we always have to make for our Normal approximation method: \(n * \hat\pi > 5\) and \(n * (1 - \hat\pi) > 5\). This is required when we use the Normal approximation. It means we can’t build CIs for small-ish samples. But other methods don’t have this problem!

`method = "exact"`

uses what’s called the Clopper-Pearson method, which uses the Binomial distribution to calculate an “exact” confidence interval rather than rely on an approximation.

While being “exact” sounds better than “approximate”, the truth of the matter is that the Clopper-Pearson interval is generally wider than it needs to be, meaning you get a less precise interval:

```
library(dplyr)
binom.confint(x = 400, n = 2500, conf.level = 0.9) %>%
mutate(`CI Width` = upper - lower) %>%
select(method, lower, upper, `CI Width`) %>%
arrange(`CI Width`)
```

```
## method lower upper CI Width
## 1 bayes 0.1480550 0.1721635 0.02410856
## 2 cloglog 0.1481500 0.1722628 0.02411279
## 3 profile 0.1481871 0.1723036 0.02411651
## 4 wilson 0.1483082 0.1724269 0.02411870
## 5 probit 0.1482369 0.1723573 0.02412042
## 6 asymptotic 0.1479397 0.1720603 0.02412053
## 7 logit 0.1483044 0.1724312 0.02412679
## 8 agresti-coull 0.1483026 0.1724325 0.02412988
## 9 lrt 0.1481877 0.1723265 0.02413880
## 10 exact 0.1480388 0.1725544 0.02451559
## 11 prop.test 0.1459601 0.1750977 0.02913765
```

Since we have a large sample, the differences aren’t very large, but there are times when you want every ounce of precision you can get!

### Bayesian intervals

Bayesian statistics is a school of thought that says we should try to incorporate our prior knowledge about a problem when making a decision instead of letting the data stand on its own.I don’t want to get into why some folks prefer Bayesian intervals, but if you want to, just specify `method = "bayes"`

to get a Bayesian CI.

### A good general-use CI

My go-to for a simple binomial confidence interval is the Agresti-Coull method, `method = "agresti-coull"`

. It’s one of the weirder ones (Seriously, go look at the equation for it!), but generally performs as well or better than the competition across most scenarios. It’s more precise than `method = "exact"`

, doesn’t fail in small samples like `method = "asymptotic"`

, and doesn’t rely on a Bayesian approach.