# Calculate p-value
<- pbinom(4, size = 12, prob = 0.2)
p_value print(paste("P-value =", round(p_value, 4)))
[1] "P-value = 0.9274"
Statistical inference is how we draw conclusions about a population from a sample. It’s like being a detective: we never have all the information, but we can make educated guesses based on the evidence we have.
A person claims to possess ESP (extrasensory perception) abilities that enable them to predict coin flips. This scenario illustrates the fundamental logic of statistical hypothesis testing.
Statistical hypothesis testing follows a logic similar to proof by contradiction in mathematics:
The null hypothesis (H₀) serves as our “assumption to be disproven” and typically represents:
In our ESP case: Random guessing (p = 0.5)
The alternative hypothesis (H₁) represents what we suspect might be true:
In our ESP case: Better than random guessing (p > 0.5)
We establish a conventional cutoff point (α) that defines “extremely unlikely”:
The p-value quantifies the logical argument:
P-value (statistical significance): In statistical hypothesis significance testing, the p-value is the probability of obtaining test results (outcomes) at least as extreme as the result actually observed, under the assumption that the null hypothesis (H0) is correct. A very small p-value means that, if the null hypothesis were true, the probability of observing data as extreme as or more extreme than what we actually observed would be very small (the empirical outcome “contradicts” H0). The smaller the p-value, the stronger the statistical evidence against the null hypothesis, leading us to reject H0 at predetermined significance levels (cut-off or threshold probability) such as 0.05 or 0.01, while recognizing that these thresholds are conventions rather than mathematically derived boundaries.
Statistical Hypotheses: \begin{align*} H_0&: p = 0.5 \text{ (random guessing)} \\ H_1&: p > 0.5 \text{ (better than guessing)} \end{align*}
Probability Calculation:
For 70 successes out of 100 trials:
\text{P-value} = P(X \geq 70) = \sum_{k=70}^{100} \binom{100}{k}(0.5)^k(0.5)^{100-k} \approx 0.0000393
Decision Framework: \text{Decision Rule} = \begin{cases} \text{Reject H}_0 & \text{if p-value} < 0.05 \\ \text{Fail to reject H}_0 & \text{if p-value} \geq 0.05 \end{cases}
0.0000393 < 0.05 (significance level)
This means that under the null hypothesis (pure guessing):
The binomial test is a hypothesis test used when you have binary (two-outcome) trials, where each trial is independent and has the same probability of success. It tests whether the observed proportion of successes differs significantly from an expected probability under the null hypothesis. For example: Testing whether a coin is fair by checking if the proportion of heads in 100 flips differs significantly from the expected probability of 0.5 under the null hypothesis.
A p-value is a probability that captures how extreme our observed data is relative to a null hypothesis:
The p-value is the probability of obtaining the observed outcome, or a more extreme one in the direction of the alternative hypothesis, assuming the null hypothesis (H₀) is true.
The choice between one-tailed and two-tailed tests depends on your alternative hypothesis and the context of your research question:
One-Tailed Tests:
Two-Tailed Tests:
Testing if a politician is overestimating 98% support (p = 0.98) when observing 13 supporters in n = 15 people. In this context, a one-tailed test is most appropriate because the research question is inherently directional (overestimating implies p < 0.98).
\begin{align*} H_0&: p = 0.98 \\ H_1&: p < 0.98 \end{align*}
P-value calculation: \begin{align*} \text{p-value} &= P(X \leq 13 \mid H_0) \\ &= 1 - P(X \geq 14 \mid p = 0.98) \\ &= 0.0353 \end{align*}
Remember: The p-value quantifies evidence against H₀ but should be considered alongside practical significance and effect size.
Proof that \sqrt{2} is Irrational
Initial Assumption
If \sqrt{2} is rational, then \sqrt{2} = \frac{p}{q} where:
Algebraic Steps
Starting with \sqrt{2} = \frac{p}{q}
Square both sides: 2 = \frac{p^2}{q^2}
Multiply both sides by q^2: 2q^2 = p^2
Properties of p and q
Contradiction
Thus, \sqrt{2} is irrational.
We use a similar but probabilistic approach:
Key Difference: We deal with probability, not certainty.
For a coin example:
Think of H₀ as the “innocent until proven guilty” assumption.
Suppose we flip coin 100 times:
Ask: “If coin were truly fair (H₀ true), how likely is this result?”
This is like asking in a legal case:
In hypothesis testing, we can make two types of errors:
Context | Type I Concern | Type II Concern | Typical α |
---|---|---|---|
Criminal Justice | Convict innocent | Free guilty | 0.001 |
Medical Testing | Unnecessary treatment | Miss disease | 0.01 |
The framework of statistical hypothesis testing as we know it today was largely developed by Jerzy Neyman, a Polish mathematician and statistician, in collaboration with Egon Pearson. Born in Bendery, Imperial Russia (now Moldova), Neyman made fundamental contributions to statistics that transformed both theoretical foundations and practical applications.
His most significant contributions include:
The potential outcomes framework, first introduced by Neyman in his 1923 master’s thesis on agricultural experiments, revolutionized how we think about causality in statistics. This framework, later rediscovered and expanded by Donald Rubin (hence sometimes called the Neyman-Rubin causal model), introduced the concept of comparing potential outcomes that would occur under different treatments. For each unit, Neyman conceived of multiple potential outcomes, only one of which could be observed - a fundamental concept now known as the “fundamental problem of causal inference.”
His approach to statistical inference differed notably from R.A. Fisher’s significance testing, leading to important debates that helped shape modern statistical theory and practice.
The potential outcomes framework he introduced has become particularly influential in modern causal inference, epidemiology, and social sciences research. The impact of his contributions continues to be felt in how we approach statistical inference, experimental design, and causal analysis today.
p-value = P(seeing this evidence or more extreme | H₀ is true)
Like asking:
Observe 8 heads in 10 flips:
Assume H₀: p = 0.5 (fair coin)
Calculate: P(X ≥ 8) = P(8 heads) + P(9 heads) + P(10 heads) = \binom{10}{8}(0.5)^8(0.5)^2 + \binom{10}{9}(0.5)^9(0.5)^1 + \binom{10}{10}(0.5)^{10} ≈ 0.055
Interpret:
Historical reasons:
Consider p-value on continuous scale:
Always consider:
The binomial test is a statistical hypothesis test used when you have binary (two-outcome) trials, where each trial is independent and has the same probability of success. It tests whether the observed proportion of successes differs significantly from an expected probability under the null hypothesis. For example: Testing whether a coin is fair by checking if the proportion of heads in 100 flips differs significantly from the expected probability of 0.5 under the null hypothesis.
An election candidate believes she has the support of 50% (p = 0.5) of the residents in a particular town. A researcher suspects this might be an underestimation and conducts a survey. The researcher asks 10 people whether they support the candidate or not; 7 people say that they do (70% in a sample).
Calculate the p-value and decide whether there is enough evidence to reject H0 using data from the sample (assuming the critical probability = 5%).
Hypotheses:
Data:
For a one-sided test, the p-value is the probability of observing 7 or more successes out of 10 trials, assuming H_0 is true. Using the binomial distribution:
P(X \geq 7) = P(X = 7) + P(X = 8) + P(X = 9) + P(X = 10)
This is a one-tailed/sided test because we’re specifically interested in whether the candidate is under-estimating her support. In statistical terms:
Our research question only concerns under-estimation, so we only need to consider evidence in that direction (values greater than 50%). This is reflected in our alternative hypothesis H_1: p > 0.5.
We can’t just calculate P(X = 7) because:
Therefore, we must sum:
P(X \geq 7) = P(X = 7) + P(X = 8) + P(X = 9) + P(X = 10)
If we only calculated P(X = 7) = 0.1172, we would ignore these other possible outcomes that also support H_1, leading to an incorrect p-value.
For each value k, we use the binomial probability formula:
P(X = k) = \binom{n}{k} p_0^k (1-p_0)^{n-k}
Let’s calculate each term:
P-value = 0.1172 + 0.0439 + 0.0098 + 0.0010 = 0.1719 (17.19%)
Since the p-value (0.1719) is greater than the significance level (0.05), we fail to reject the null hypothesis.
There is not enough evidence to conclude that the candidate is under-estimating her support. While the sample shows 70% support (higher than 50%), this difference could reasonably occur by chance even if the true support was only 50%. The relatively small sample size (n = 10) makes it harder to detect real differences.
An election candidate believes she has the support of 40% (p = 0.4) of the residents in a particular town. A researcher suspects this might be an overestimation and conducts a survey. The researcher asks 20 people whether they support the candidate or not; 3 people say that they do (15% in a sample). Calculate the p-value and decide whether there is enough evidence to reject H0 using data from the sample (assuming the critical probability = 5%).
Hypotheses:
Data:
This is a one-tailed test because we’re specifically interested in whether the candidate is over-estimating her support. We only care about evidence suggesting the true proportion is less than 40%, leading to a left-tailed test.
For this left-tailed test, the p-value is the probability of observing 3 or fewer successes out of 20 trials, assuming H_0 is true. Using the binomial distribution:
P(X \leq 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)
For each value k, we use the binomial probability formula: P(X = k) = \binom{n}{k} p_0^k (1-p_0)^{n-k}
P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}
\binom{20}{0} = 1
\binom{20}{1} = 20
\binom{20}{2} = \frac{20 \times 19}{2 \times 1} = 190
\binom{20}{3} = \frac{20 \times 19 \times 18}{3 \times 2 \times 1} = 1,140
For k = 0: P(X = 0) = \binom{20}{0}(0.4)^0(0.6)^{20} = 1 \times 1 \times 0.6^{20} \approx 0.0000366
For k = 1: P(X = 1) = \binom{20}{1}(0.4)^1(0.6)^{19} = 20 \times 0.4 \times 0.6^{19} \approx 0.0004875
For k = 2: P(X = 2) = \binom{20}{2}(0.4)^2(0.6)^{18} = 190 \times 0.16 \times 0.6^{18} \approx 0.0030874
For k = 3: P(X = 3) = \binom{20}{3}(0.4)^3(0.6)^{17} = 1,140 \times 0.064 \times 0.6^{17} \approx 0.0123497
Sum all probabilities: P(X \leq 3) = \sum_{k=0}^3 P(X = k) = 0.0000366 + 0.0004875 + 0.0030874 + 0.0123497 = 0.0159612
Decision rule:
Since the p-value is less than the significance level (0.05), we reject the null hypothesis.
There is sufficient evidence at the 5% significance level to conclude that the candidate is overestimating her support. The sample shows only 15% support, which is significantly lower than the candidate’s belief of 40%. The probability of observing such low support (3 or fewer out of 20) would be only about 1.6% if the true support were actually 40%.
A political candidate claims that 40% of residents in a town support her campaign (p = 0.4). A researcher suspects this might be an overestimation and conducts a survey. In a random sample of 12 residents, 1 person expresses support for the candidate. Test whether there is sufficient evidence to conclude that the candidate is overestimating her support level, using a significance level of 5%.
\begin{align*} H_0&: p = 0.4 \text{ (The candidate's claim is correct)} \\ H_1&: p < 0.4 \text{ (The candidate is overestimating support)} \end{align*}
For a left-tailed test, we calculate the probability of observing 1 or fewer successes under H_0.
Using the binomial probability formula: P(X \leq 1) = \sum_{k=0}^{1} \binom{12}{k}(0.4)^k(0.6)^{12-k}
We find: P(X \leq 1) = 0.0196
Decision Rule:
At a 5% significance level, there is sufficient evidence to conclude that the candidate is overestimating her support. The sample proportion (8.3%) is substantially lower than the claimed 40% support, and this difference is statistically significant (p = 0.0196).
While the sample size (n = 12) is relatively small, we were still able to detect a significant difference. This is because the observed difference between the claimed proportion (40%) and sample proportion (8.3%) was quite large. However, a larger sample size would provide more reliable results and better estimation of the true support proportion.
An election candidate claims that 20% of residents in a town support her campaign. A researcher believes the candidate might be over-estimating her support and wants to test this claim. In a random sample of 12 residents, 4 people express support for the candidate. Test whether there is sufficient evidence to conclude that the candidate is over-estimating her support level, using a significance level of 5%.
Given:
Step 1: State the Hypotheses
Since we want to test if the candidate is over-estimating (true proportion is less than claimed):
\begin{align*} H_0&: p = 0.2 \text{ (The candidate's claim is correct)} \\ H_1&: p < 0.2 \text{ (The candidate is overestimating support)} \end{align*}
Step 2: Choose the Test Statistic
We use the number of successes (X) in the sample, where X follows a binomial distribution with n = 12 and p = 0.2 under H₀.
Observed value: x = 4
Step 3: Calculate the Test Statistic and P-value
For a left-tailed test, we calculate:
P(X \leq 4) = \sum_{k=0}^{4} \binom{12}{k}(0.2)^k(0.8)^{12-k}
# Calculate p-value
<- pbinom(4, size = 12, prob = 0.2)
p_value print(paste("P-value =", round(p_value, 4)))
[1] "P-value = 0.9274"
The p-value is 0.9274
Step 4: Decision Rule
Step 5: Interpretation
At a 5% significance level, there is not enough evidence to conclude that the candidate is over-estimating her support. In fact, the sample data shows 4/12 ≈ 33.3% support, which is higher than her claimed 20%, going in the opposite direction of our alternative hypothesis.
Sample Proportion: \hat{p} = \frac{x}{n} = \frac{4}{12} = 0.333
The high p-value reflects that the sample proportion (33.3%) is actually higher than the hypothesized value (20%), not lower as we were testing for.
If we had suspected under-estimation rather than over-estimation, we should have set up the test with H₁: p > 0.2.
Given the small sample size (n = 12), the power of this test to detect true differences is limited.
Here’s the complete R code for this analysis:
# Given values
<- 12 # sample size
n <- 4 # number of successes
x <- 0.2 # hypothesized proportion
p0 <- 0.05 # significance level
alpha
# Calculate p-value for left-tailed test
<- pbinom(x, size = n, prob = p0)
p_value
# Calculate sample proportion
<- x/n
p_hat
# Print results
cat("Sample proportion =", round(p_hat, 3), "\n")
Sample proportion = 0.333
cat("P-value =", round(p_value, 4), "\n")
P-value = 0.9274
cat("Decision: ", ifelse(p_value < alpha, "Reject H0", "Fail to reject H0"), "\n")
Decision: Fail to reject H0
A politician believes that support for his country’s EU membership is about 98% (p = 0.98). A researcher wants to test whether the politician is overestimating this level of support.
In a sample of 15 people (n = 15), the researcher observes that 13 people support membership. Let’s define the random variable X as the number of people in the sample who support EU membership. We observed X = 13 “successes” in 15 Bernoulli trials.
Is there enough evidence to reject the claim that the support is 98%?
Hypotheses:
Data:
For this left-tailed test, we need P(X \leq 13). Given the high value of p_0, it’s more efficient to use the complement rule:
P(X \leq 13) = 1 - P(X \geq 14)
The complement rule in probability states that P(A) = 1 - P(not A), because P(A) + P(not A) = 1. For a left-tailed test, we need P(X ≤ 13). Instead of summing P(X = 0) + P(X = 1) + … + P(X = 13), it’s easier to: calculate P(X ≤ 13) = 1 - P(X > 13), or P(X ≤ 13) = 1 - P(X ≥ 14).
For this left-tailed test: P(X \leq 13) = 1 - P(X > 13) = 1 - P(X = 14) - P(X = 15)
Let’s calculate step by step:
\binom{15}{14} = \frac{15!}{14!(15-14)!} = \frac{15}{1} = 15
P(X = 14) = \binom{15}{14}(0.98)^{14}(0.02)^1 = 15 \times (0.98)^{14} \times 0.02 = 15 \times 0.75051... \times 0.02 \approx 0.2252
\binom{15}{15} = 1
P(X = 15) = \binom{15}{15}(0.98)^{15}(0.02)^0 = 1 \times (0.98)^{15} \times 1 = (0.98)^{15} \approx 0.7395
P(X \leq 13) = 1 - P(X = 14) - P(X = 15) = 1 - 0.2252 - 0.7395 \approx 0.0353
Since the p-value (0.0353) is less than the significance level (0.05), we reject the null hypothesis.
There is sufficient evidence to conclude that the politician is overestimating the support for EU membership. While 86.7% support in the sample is still very high, it’s significantly lower than the politician’s claim of 98%. Under the assumption that true support is 98%, the probability of observing 13 or fewer supporters in a sample of 15 people would be only about 3.53%.
Note on Complement Rule
This problem demonstrates the utility of the complement rule in probability calculations. Instead of calculating probabilities for outcomes 0 through 13 (14 calculations), we only needed to calculate probabilities for outcomes 14 and 15 (2 calculations). This is particularly efficient when:
(…)
(…)