4.7 Sampling & Confidence

Random Sampling


You work for the president and you want to estimate the fraction \(p\) of voters in the entire nation that will prefer him in the upcoming elections. You do this by random sampling. Specifically, you select \(n\) voters independently and randomly, ask them who they are going to vote for, and use the fraction \(P\) of those that say they will vote for the President as an estimate for \(p\).

  1. Our theorems about sampling and distributions allow us to calculate how confident we can be that the random variable \(P\) takes a value near the constant \(p\). This calculation uses some facts about voters and the way they are chosen.

    Which of the following facts are true?

    Exercise 1

    The preference of any particular voter is a constant: either "the President" or "not the President", so (1) is false and (2) is true. (3) is false; in fact, the Birthday "paradox" implies the probability of some voter being chosen more than once rapidly approaches 1 as \(n\) grows beyond 100. (4) holds by definition of randomly choosing an item from a set. (5) is false because successive voters in the sequence are chosen independently. (6) is true because, for example, the fraction of voters who prefer the President in the largest states may all be \(< p\).

  2. Suppose that, according to your calculations, the following is true about your polling:

    \[\Pr[ |P-p| \leq 0.04 ] \geq 0.95\]

    You run the poll, you count how respondents many said they will vote for the President, you divide by \(n\), and find 0.53. You call the President, and... what do you say?

    Exercise 2

    You cannot say (1): the only way to know the exact value of the constant \(p\) is to ask all 250,000,000 voters.

    You cannot say (2) either: \(p\) is a \(constant\) which can either be or not be within 0.04 of 0.53. If it is, then the probability that it is is 1, and thus at least 0.95, and therefore (2) will be true. If it is not, then the probability that it is is 0, and thus smaller than 0.95, and therefore (2) will be false.

    You can say (3): To see why, start with the statement

    either \(|0.53 - p| \leq 0.04\) or \(|0.53 - p| > 0.04\) is true.
    which is obviously true. Now read it as follows: Either \(p\) is within 0.04 of 0.53 or it is not and therefore my random variable \(P\) took a value from a set that is hit only 5 times in 100. So, clearly, either \(p\) is within 0.04 of 0.53 or something strange has happened.

    You can say (4): By rephrasing (2) as "confidence" rather than probability, you are correctly indicating that you are talking about the probable behavior of your methodology for sampling \(p\), not the actual value of \(p\).