Statistical Hypothesis Testing: A Fundamental Logic

20.1 Statistical Hypothesis Testing (introduction)

Statistical inference is how we draw conclusions about a population from a sample. It’s like being a detective: we never have all the information, but we can make educated guesses based on the evidence we have.

Key Steps in Statistical Hypothesis Testing (general framework)
  1. Initial Suspicion/Research Question
    • We suspect some effect/relationship/difference
    • This guides our research design and analysis
  2. Data Collection
    • We collect appropriate amount of data
    • Sample size depends on expected effect and required precision
  3. Result Observation
    • We observe and summarize our data
    • Look at relevant patterns in the data
  4. Hypothesis System
    • H₀: no effect/no difference (“status quo”)
    • H₁: effect exists (one or two-sided)
    • Choice of direction based on research question
  5. P-value Approach
    • Consider: how likely are our results (or more extreme) if H₀ is true?
    • Choose appropriate probability model based on data type
    • Calculate this probability (p-value)
  6. Decision Making
    • Compare p-value to significance level (typically α = 0.05)
    • Small p-value suggests results unlikely under H₀
  7. Conclusion
    • If p ≤ α, reject H₀
    • Conclude evidence against null hypothesis
    • Consider practical significance
The Logic of Statistical Hypothesis Testing: A Probabilistic Proof by Contradiction

Research Context

A person claims to possess ESP (extrasensory perception) abilities that enable them to predict coin flips. This scenario illustrates the fundamental logic of statistical hypothesis testing.

  • Task: Predict 100 coin flips before each flip occurs
  • Observed Result: 70 correct predictions out of 100 attempts
  • Key Consideration: High success could indicate either ESP ability OR a biased coin

The Core Logic

Statistical hypothesis testing follows a logic similar to proof by contradiction in mathematics:

  1. We start by assuming what we want to disprove (the null hypothesis)
  2. Calculate the probability of our observed data under this assumption
  3. If this probability is extremely small, we reject the initial assumption

Statistical Framework

1. The Null Hypothesis Mechanism

The null hypothesis (H₀) serves as our “assumption to be disproven” and typically represents:

  • No effect
  • No difference
  • Pure chance
  • The status quo

In our ESP case: Random guessing (p = 0.5)

2. The Alternative Hypothesis

The alternative hypothesis (H₁) represents what we suspect might be true:

  • An effect exists
  • A difference exists
  • Not due to chance
  • A deviation from status quo

In our ESP case: Better than random guessing (p > 0.5)

3. The Decision Rule

We establish a conventional cutoff point (α) that defines “extremely unlikely”:

  • Typically set at α = 0.05 (5%)
  • Represents the threshold for “rare enough” to reject H₀
  • A conventional threshold, not a mathematical necessity

4. The P-value Mechanism

The p-value quantifies the logical argument:

  • Assuming H₀ is true
  • What’s the probability of our observed result or more extreme?
  • Small p-value means either:
    • H₀ is false (our desired conclusion)
    • A rare event occurred under H₀

P-value (statistical significance): In statistical hypothesis significance testing, the p-value is the probability of obtaining test results (outcomes) at least as extreme as the result actually observed, under the assumption that the null hypothesis (H0) is correct. A very small p-value means that, if the null hypothesis were true, the probability of observing data as extreme as or more extreme than what we actually observed would be very small (the empirical outcome “contradicts” H0). The smaller the p-value, the stronger the statistical evidence against the null hypothesis, leading us to reject H0 at predetermined significance levels (cut-off or threshold probability) such as 0.05 or 0.01, while recognizing that these thresholds are conventions rather than mathematically derived boundaries.

Application to ESP Testing

Statistical Hypotheses: \begin{align*} H_0&: p = 0.5 \text{ (random guessing)} \\ H_1&: p > 0.5 \text{ (better than guessing)} \end{align*}

Probability Calculation:

For 70 successes out of 100 trials:

\text{P-value} = P(X \geq 70) = \sum_{k=70}^{100} \binom{100}{k}(0.5)^k(0.5)^{100-k} \approx 0.0000393

Decision Framework: \text{Decision Rule} = \begin{cases} \text{Reject H}_0 & \text{if p-value} < 0.05 \\ \text{Fail to reject H}_0 & \text{if p-value} \geq 0.05 \end{cases}

0.0000393 < 0.05 (significance level)

This means that under the null hypothesis (pure guessing):

  • The probability of getting 70 or more correct predictions by chance is about 0.00393%
  • Such extreme results would occur by chance only about 4 times in 100,000 trials
  • This is far below our conventional significance level of 0.05 (5%)

The General Pattern

  1. Assume the null (like assuming not-A in proof by contradiction)
  2. Calculate probability of data under null
  3. If probability < α:
    • Reject null
    • Accept alternative
    • Conclude evidence supports our suspicion

Key Distinctions from Mathematical Proof

  1. Probabilistic rather than deterministic
  2. Conclusions are “supported” rather than “proven”
  3. Uses conventional thresholds (α)
  4. Always includes uncertainty

The binomial test is a hypothesis test used when you have binary (two-outcome) trials, where each trial is independent and has the same probability of success. It tests whether the observed proportion of successes differs significantly from an expected probability under the null hypothesis. For example: Testing whether a coin is fair by checking if the proportion of heads in 100 flips differs significantly from the expected probability of 0.5 under the null hypothesis.

What is a P-value?

A p-value is a probability that captures how extreme our observed data is relative to a null hypothesis:

The p-value is the probability of obtaining the observed outcome, or a more extreme one in the direction of the alternative hypothesis, assuming the null hypothesis (H₀) is true.

Key Components

  1. Observed outcome: The actual value or statistic computed from our sample data.
  2. More extreme outcomes: Additional outcomes that provide even stronger evidence against H₀: \begin{cases} \text{Values } \leq \text{ observed} & \text{for } H_1\text{: parameter < value} \\ \text{Values } \geq \text{ observed} & \text{for } H_1\text{: parameter > value} \\ \text{Values in both tails} & \text{for } H_1\text{: parameter } \neq \text{ value} \end{cases}
  3. Null hypothesis assumption: All probabilities are calculated using the parameter value specified in H₀.

One-Tailed vs Two-Tailed Tests

The choice between one-tailed and two-tailed tests depends on your alternative hypothesis and the context of your research question:

One-Tailed Tests:

  • Used when H₁ specifies a direction (< or >)
  • P-value calculated from one tail of the distribution
  • Higher power but requires strong directional justification
  • Example hypotheses: \begin{align*} H_0&: \mu = \mu_0 \\ H_1&: \mu > \mu_0 \text{ (right-tailed) or } \mu < \mu_0 \text{ (left-tailed)} \end{align*}

Two-Tailed Tests:

  • Used when H₁ is non-directional (≠)
  • P-value includes both tails of the distribution
  • More conservative, standard choice when direction uncertain
  • Particularly suitable for symmetric distributions like the normal distribution
  • Example hypotheses: \begin{align*} H_0&: \mu = \mu_0 \\ H_1&: \mu \neq \mu_0 \end{align*}

Example: Binomial Test

Testing if a politician is overestimating 98% support (p = 0.98) when observing 13 supporters in n = 15 people. In this context, a one-tailed test is most appropriate because the research question is inherently directional (overestimating implies p < 0.98).

\begin{align*} H_0&: p = 0.98 \\ H_1&: p < 0.98 \end{align*}

P-value calculation: \begin{align*} \text{p-value} &= P(X \leq 13 \mid H_0) \\ &= 1 - P(X \geq 14 \mid p = 0.98) \\ &= 0.0353 \end{align*}

Common Misconceptions to Avoid

  1. P-value ≠ Probability H₀ is true
  2. 1 - p-value ≠ Probability H₁ is true
  3. Large p-value ≠ H₀ is true
  4. One-tailed tests aren’t automatically “better” despite higher power
  5. The choice between one-tailed and two-tailed tests should be based on the research context, not just statistical convenience

Remember: The p-value quantifies evidence against H₀ but should be considered alongside practical significance and effect size.

The Method of Proof by Contradiction

In Mathematics

Proof that \sqrt{2} is Irrational

Initial Assumption

If \sqrt{2} is rational, then \sqrt{2} = \frac{p}{q} where:

  • p and q are integers
  • q \neq 0
  • p and q have no common factors

Algebraic Steps

  1. Starting with \sqrt{2} = \frac{p}{q}

  2. Square both sides: 2 = \frac{p^2}{q^2}

  3. Multiply both sides by q^2: 2q^2 = p^2

Properties of p and q

  1. Since 2q^2 = p^2, p^2 is even
  2. If p^2 is even, then p is even
  3. Therefore p = 2k for some integer k
  4. Substituting p = 2k into 2q^2 = p^2: 2q^2 = (2k)^2 = 4k^2
  5. Therefore q^2 = 2k^2
  6. Thus q^2 is even, so q is even

Contradiction

  • We proved both p and q are even
  • This contradicts our assumption that p and q have no common factors
  • Therefore, \sqrt{2} cannot be rational

Thus, \sqrt{2} is irrational.

In Statistics

We use a similar but probabilistic approach:

  1. Make assumption (null hypothesis)
  2. See if data contradicts this assumption
  3. If contradiction is strong enough, reject assumption

Key Difference: We deal with probability, not certainty.

The Null Hypothesis Framework

Step 1: State the Hypotheses

For a coin example:

  • H₀ (Null): Coin is fair (p = 0.5)
  • H₁ (Alternative): Coin is not fair (p ≠ 0.5)

Think of H₀ as the “innocent until proven guilty” assumption.

Step 2: Collect Evidence

Suppose we flip coin 100 times:

  • Expected under H₀: About 50 heads
  • Actually observe: 70 heads

Step 3: Assess Evidence

Ask: “If coin were truly fair (H₀ true), how likely is this result?”

This is like asking in a legal case:

  • “If defendant were innocent, how do we explain the evidence?”
  • Very improbable evidence suggests innocence might be false
Understanding Error Types in Hypothesis Testing

In hypothesis testing, we can make two types of errors:

Type I Error (False Positive)

  • Definition: Rejecting H₀ when it is actually true
  • Probability: α (significance level)
  • Example in Justice System: Convicting an innocent person
  • Example in Medicine: Diagnosing healthy patient as sick

Type II Error (False Negative)

  • Definition: Failing to reject H₀ when it is actually false
  • Probability: β
  • Power = 1 - β (probability of correctly rejecting false H₀)
  • Example in Justice System: Letting guilty person go free
  • Example in Medicine: Missing an actual disease

Trade-off Between Errors

  • Decreasing α (being more conservative) increases β
  • Decreasing β (increasing power) requires either:
    • Larger sample size
    • Larger effect size
    • Higher α

Practical Implications

Context Type I Concern Type II Concern Typical α
Criminal Justice Convict innocent Free guilty 0.001
Medical Testing Unnecessary treatment Miss disease 0.01
Historical Note: Jerzy Neyman (1894-1981)

The framework of statistical hypothesis testing as we know it today was largely developed by Jerzy Neyman, a Polish mathematician and statistician, in collaboration with Egon Pearson. Born in Bendery, Imperial Russia (now Moldova), Neyman made fundamental contributions to statistics that transformed both theoretical foundations and practical applications.

His most significant contributions include:

  • Development of the formal hypothesis testing framework, introducing the concepts of Type I and Type II errors
  • Creation of confidence intervals as a way to express uncertainty in estimation
  • Pioneering the potential outcomes framework in causal inference
  • Advancement of sampling theory

The potential outcomes framework, first introduced by Neyman in his 1923 master’s thesis on agricultural experiments, revolutionized how we think about causality in statistics. This framework, later rediscovered and expanded by Donald Rubin (hence sometimes called the Neyman-Rubin causal model), introduced the concept of comparing potential outcomes that would occur under different treatments. For each unit, Neyman conceived of multiple potential outcomes, only one of which could be observed - a fundamental concept now known as the “fundamental problem of causal inference.”

His approach to statistical inference differed notably from R.A. Fisher’s significance testing, leading to important debates that helped shape modern statistical theory and practice.

The potential outcomes framework he introduced has become particularly influential in modern causal inference, epidemiology, and social sciences research. The impact of his contributions continues to be felt in how we approach statistical inference, experimental design, and causal analysis today.

The Logic of P-values

What is a p-value?

p-value = P(seeing this evidence or more extreme | H₀ is true)

Like asking:

  • “How surprising is this evidence if H₀ is true?”
  • “Could this easily happen by chance?”

Example: Step by Step

Observe 8 heads in 10 flips:

  1. Assume H₀: p = 0.5 (fair coin)

  2. Calculate: P(X ≥ 8) = P(8 heads) + P(9 heads) + P(10 heads) = \binom{10}{8}(0.5)^8(0.5)^2 + \binom{10}{9}(0.5)^9(0.5)^1 + \binom{10}{10}(0.5)^{10} ≈ 0.055

  3. Interpret:

    • About 5.5% chance of seeing this under H₀
    • Moderately unusual, but not extremely so

Common Misunderstandings

  1. “p-value is probability H₀ is true”
    • No: It’s probability of data, assuming H₀
    • Like P(evidence|innocent), not P(innocent|evidence)
  2. “Small p-value proves H₀ false”
    • No: Only suggests H₀ unlikely
    • Like strong evidence, but not proof
  3. “Large p-value proves H₀ true”
    • No: Just fails to provide evidence against H₀
    • Like “not guilty” vs “proven innocent”
Beyond Simple Significance Testing

Why Not Just Use 5%?

Historical reasons:

  1. R.A. Fisher suggested as benchmark
  2. Pre-computed tables used 5%
  3. Became convention, not logical necessity

Better Approach:

Consider p-value on continuous scale:

  • p = 0.049 vs p = 0.051 are virtually same
  • Context matters:
    • Medical trials might need p < 0.001
    • Market research might accept p < 0.1

Practical Significance

Always consider:

  1. Effect size (how big is the difference?)
  2. Practical importance
  3. Costs and benefits of decisions
  4. Sample size and power

20.2 Problem Solutions – part 1: the binomial tests (one-tailed/sided tests)

The binomial test is a statistical hypothesis test used when you have binary (two-outcome) trials, where each trial is independent and has the same probability of success. It tests whether the observed proportion of successes differs significantly from an expected probability under the null hypothesis. For example: Testing whether a coin is fair by checking if the proportion of heads in 100 flips differs significantly from the expected probability of 0.5 under the null hypothesis.

Problem 1: Binomial Test for Candidate Support

Problem Statement

An election candidate believes she has the support of 50% (p = 0.5) of the residents in a particular town. A researcher suspects this might be an underestimation and conducts a survey. The researcher asks 10 people whether they support the candidate or not; 7 people say that they do (70% in a sample).

Calculate the p-value and decide whether there is enough evidence to reject H0 using data from the sample (assuming the critical probability = 5%).

Setup

Hypotheses:

  • H_0: p = 0.5 (null hypothesis: true proportion (support) is 50%)
  • H_1: p > 0.5 (alternative hypothesis: true proportion is greater than 50%)

Data:

  • Sample size: n = 10
  • Observed successes: x = 7 (70% support)
  • Hypothesized proportion: p_0 = 0.5
  • Significance level: \alpha = 0.05 (5%)

Finding the P-value

For a one-sided test, the p-value is the probability of observing 7 or more successes out of 10 trials, assuming H_0 is true. Using the binomial distribution:

P(X \geq 7) = P(X = 7) + P(X = 8) + P(X = 9) + P(X = 10)

Why One-Tailed Test?

This is a one-tailed/sided test because we’re specifically interested in whether the candidate is under-estimating her support. In statistical terms:

  • If she believes support is 50% but it’s actually 40%, she’s over-estimating her support
  • If she believes support is 50% but it’s actually 60%, she’s under-estimating her support

Our research question only concerns under-estimation, so we only need to consider evidence in that direction (values greater than 50%). This is reflected in our alternative hypothesis H_1: p > 0.5.

Why Not Just P(X = 7)?

We can’t just calculate P(X = 7) because:

  1. The p-value represents the probability of observing results as extreme or more extreme than what we saw, assuming H_0 is true
  2. “More extreme” means results that provide even stronger evidence against H_0 in the direction of H_1
  3. Since H_1 suggests higher proportions than 50%, outcomes of 8, 9, or 10 supporters out of 10 would be even stronger evidence against H_0

Therefore, we must sum:

P(X \geq 7) = P(X = 7) + P(X = 8) + P(X = 9) + P(X = 10)

If we only calculated P(X = 7) = 0.1172, we would ignore these other possible outcomes that also support H_1, leading to an incorrect p-value.

For each value k, we use the binomial probability formula:

P(X = k) = \binom{n}{k} p_0^k (1-p_0)^{n-k}

Let’s calculate each term:

  • P(X = 7) = \binom{10}{7} (0.5)^7 (0.5)^3 = 120 \cdot (0.5)^{10} = 0.1172
  • P(X = 8) = \binom{10}{8} (0.5)^8 (0.5)^2 = 45 \cdot (0.5)^{10} = 0.0439
  • P(X = 9) = \binom{10}{9} (0.5)^9 (0.5)^1 = 10 \cdot (0.5)^{10} = 0.0098
  • P(X = 10) = \binom{10}{10} (0.5)^{10} (0.5)^0 = 1 \cdot (0.5)^{10} = 0.0010

P-value = 0.1172 + 0.0439 + 0.0098 + 0.0010 = 0.1719 (17.19%)

Decision

Since the p-value (0.1719) is greater than the significance level (0.05), we fail to reject the null hypothesis.

Interpretation

There is not enough evidence to conclude that the candidate is under-estimating her support. While the sample shows 70% support (higher than 50%), this difference could reasonably occur by chance even if the true support was only 50%. The relatively small sample size (n = 10) makes it harder to detect real differences.

Problem 2: Binomial Test for Candidate Support (2)

Problem Statement

An election candidate believes she has the support of 40% (p = 0.4) of the residents in a particular town. A researcher suspects this might be an overestimation and conducts a survey. The researcher asks 20 people whether they support the candidate or not; 3 people say that they do (15% in a sample). Calculate the p-value and decide whether there is enough evidence to reject H0 using data from the sample (assuming the critical probability = 5%).

Setup

Hypotheses:

  • H_0: p = 0.4 (null hypothesis: true proportion is 40%)
  • H_1: p < 0.4 (alternative hypothesis: true proportion is less than 40%)

Data:

  • Sample size: n = 20
  • Observed successes: x = 3 (15% support)
  • Hypothesized proportion: p_0 = 0.4
  • Significance level: \alpha = 0.05 (5%)

Why One-Tailed Test?

This is a one-tailed test because we’re specifically interested in whether the candidate is over-estimating her support. We only care about evidence suggesting the true proportion is less than 40%, leading to a left-tailed test.

Finding the P-value

For this left-tailed test, the p-value is the probability of observing 3 or fewer successes out of 20 trials, assuming H_0 is true. Using the binomial distribution:

P(X \leq 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)

For each value k, we use the binomial probability formula: P(X = k) = \binom{n}{k} p_0^k (1-p_0)^{n-k}

  1. We want to find P(X ≤ 3) when X follows B(20, 0.4) Using the binomial formula:

P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}

  1. Calculate the combinations \binom{20}{k} for k = 0, 1, 2, 3:

\binom{20}{0} = 1

\binom{20}{1} = 20

\binom{20}{2} = \frac{20 \times 19}{2 \times 1} = 190

\binom{20}{3} = \frac{20 \times 19 \times 18}{3 \times 2 \times 1} = 1,140

  1. Calculate each probability:

For k = 0: P(X = 0) = \binom{20}{0}(0.4)^0(0.6)^{20} = 1 \times 1 \times 0.6^{20} \approx 0.0000366

For k = 1: P(X = 1) = \binom{20}{1}(0.4)^1(0.6)^{19} = 20 \times 0.4 \times 0.6^{19} \approx 0.0004875

For k = 2: P(X = 2) = \binom{20}{2}(0.4)^2(0.6)^{18} = 190 \times 0.16 \times 0.6^{18} \approx 0.0030874

For k = 3: P(X = 3) = \binom{20}{3}(0.4)^3(0.6)^{17} = 1,140 \times 0.064 \times 0.6^{17} \approx 0.0123497

  1. Sum all probabilities: P(X \leq 3) = \sum_{k=0}^3 P(X = k) = 0.0000366 + 0.0004875 + 0.0030874 + 0.0123497 = 0.0159612

  2. Decision rule:

  • Reject H₀ if p-value < α
  • Since 0.0159612 < 0.05, we reject H₀.

Decision

Since the p-value is less than the significance level (0.05), we reject the null hypothesis.

Interpretation

There is sufficient evidence at the 5% significance level to conclude that the candidate is overestimating her support. The sample shows only 15% support, which is significantly lower than the candidate’s belief of 40%. The probability of observing such low support (3 or fewer out of 20) would be only about 1.6% if the true support were actually 40%.

Problem 3: Binomial Test for Candidate Support (3)

Problem Statement

A political candidate claims that 40% of residents in a town support her campaign (p = 0.4). A researcher suspects this might be an overestimation and conducts a survey. In a random sample of 12 residents, 1 person expresses support for the candidate. Test whether there is sufficient evidence to conclude that the candidate is overestimating her support level, using a significance level of 5%.

Hypotheses

\begin{align*} H_0&: p = 0.4 \text{ (The candidate's claim is correct)} \\ H_1&: p < 0.4 \text{ (The candidate is overestimating support)} \end{align*}

Given Information

  • Sample size: n = 12
  • Number of successes: x = 1
  • Hypothesized proportion: p_0 = 0.4
  • Significance level: \alpha = 0.05
  • Observed proportion: \hat{p} = \frac{1}{12} \approx 0.083

Solution

  1. For a left-tailed test, we calculate the probability of observing 1 or fewer successes under H_0.

  2. Using the binomial probability formula: P(X \leq 1) = \sum_{k=0}^{1} \binom{12}{k}(0.4)^k(0.6)^{12-k}

  3. We find: P(X \leq 1) = 0.0196

  4. Decision Rule:

    • Reject H_0 if p-value < \alpha
    • Since 0.0196 < 0.05, we reject H_0

Conclusion

At a 5% significance level, there is sufficient evidence to conclude that the candidate is overestimating her support. The sample proportion (8.3%) is substantially lower than the claimed 40% support, and this difference is statistically significant (p = 0.0196).

Statistical Power Consideration

While the sample size (n = 12) is relatively small, we were still able to detect a significant difference. This is because the observed difference between the claimed proportion (40%) and sample proportion (8.3%) was quite large. However, a larger sample size would provide more reliable results and better estimation of the true support proportion.

Problem 4: Binomial Test for Candidate Support (4)

An election candidate claims that 20% of residents in a town support her campaign. A researcher believes the candidate might be over-estimating her support and wants to test this claim. In a random sample of 12 residents, 4 people express support for the candidate. Test whether there is sufficient evidence to conclude that the candidate is over-estimating her support level, using a significance level of 5%.

Given:

  • Claimed support: p = 0.2
  • Sample size: n = 12
  • Number of supporters in sample: x = 4
  • Significance level: α = 0.05

Solution

Step 1: State the Hypotheses

Since we want to test if the candidate is over-estimating (true proportion is less than claimed):

\begin{align*} H_0&: p = 0.2 \text{ (The candidate's claim is correct)} \\ H_1&: p < 0.2 \text{ (The candidate is overestimating support)} \end{align*}

Step 2: Choose the Test Statistic

We use the number of successes (X) in the sample, where X follows a binomial distribution with n = 12 and p = 0.2 under H₀.

Observed value: x = 4

Step 3: Calculate the Test Statistic and P-value

For a left-tailed test, we calculate:

P(X \leq 4) = \sum_{k=0}^{4} \binom{12}{k}(0.2)^k(0.8)^{12-k}

# Calculate p-value
p_value <- pbinom(4, size = 12, prob = 0.2)
print(paste("P-value =", round(p_value, 4)))
[1] "P-value = 0.9274"

The p-value is 0.9274

Step 4: Decision Rule

  • Reject H₀ if p-value < α
  • Since 0.9274 > 0.05, we fail to reject H₀

Step 5: Interpretation

At a 5% significance level, there is not enough evidence to conclude that the candidate is over-estimating her support. In fact, the sample data shows 4/12 ≈ 33.3% support, which is higher than her claimed 20%, going in the opposite direction of our alternative hypothesis.

Additional Notes

  1. Sample Proportion: \hat{p} = \frac{x}{n} = \frac{4}{12} = 0.333

  2. The high p-value reflects that the sample proportion (33.3%) is actually higher than the hypothesized value (20%), not lower as we were testing for.

  3. If we had suspected under-estimation rather than over-estimation, we should have set up the test with H₁: p > 0.2.

  4. Given the small sample size (n = 12), the power of this test to detect true differences is limited.

R Code

Here’s the complete R code for this analysis:

# Given values
n <- 12          # sample size
x <- 4           # number of successes
p0 <- 0.2        # hypothesized proportion
alpha <- 0.05    # significance level

# Calculate p-value for left-tailed test
p_value <- pbinom(x, size = n, prob = p0)

# Calculate sample proportion
p_hat <- x/n

# Print results
cat("Sample proportion =", round(p_hat, 3), "\n")
Sample proportion = 0.333 
cat("P-value =", round(p_value, 4), "\n")
P-value = 0.9274 
cat("Decision: ", ifelse(p_value < alpha, "Reject H0", "Fail to reject H0"), "\n")
Decision:  Fail to reject H0 

Problem 5: Binomial Test for EU Support

Problem Statement

A politician believes that support for his country’s EU membership is about 98% (p = 0.98). A researcher wants to test whether the politician is overestimating this level of support.

In a sample of 15 people (n = 15), the researcher observes that 13 people support membership. Let’s define the random variable X as the number of people in the sample who support EU membership. We observed X = 13 “successes” in 15 Bernoulli trials.

Is there enough evidence to reject the claim that the support is 98%?

Setup

Hypotheses:

  • H_0: p = 0.98 (null hypothesis: true proportion is 98%)
  • H_1: p < 0.98 (alternative hypothesis: true proportion is less than 98%)

Data:

  • Sample size: n = 15
  • Observed successes: x = 13 (≈86.7% support)
  • Hypothesized proportion: p_0 = 0.98
  • Significance level: \alpha = 0.05 (5%)

Finding the P-value

For this left-tailed test, we need P(X \leq 13). Given the high value of p_0, it’s more efficient to use the complement rule:

P(X \leq 13) = 1 - P(X \geq 14)

The complement rule in probability states that P(A) = 1 - P(not A), because P(A) + P(not A) = 1. For a left-tailed test, we need P(X ≤ 13). Instead of summing P(X = 0) + P(X = 1) + … + P(X = 13), it’s easier to: calculate P(X ≤ 13) = 1 - P(X > 13), or P(X ≤ 13) = 1 - P(X ≥ 14).

For this left-tailed test: P(X \leq 13) = 1 - P(X > 13) = 1 - P(X = 14) - P(X = 15)

Let’s calculate step by step:

  1. For X = 14:
  • Calculate combination:

\binom{15}{14} = \frac{15!}{14!(15-14)!} = \frac{15}{1} = 15

  • Calculate probability:

P(X = 14) = \binom{15}{14}(0.98)^{14}(0.02)^1 = 15 \times (0.98)^{14} \times 0.02 = 15 \times 0.75051... \times 0.02 \approx 0.2252

  1. For X = 15:
  • Calculate combination:

\binom{15}{15} = 1

  • Calculate probability:

P(X = 15) = \binom{15}{15}(0.98)^{15}(0.02)^0 = 1 \times (0.98)^{15} \times 1 = (0.98)^{15} \approx 0.7395

  1. Therefore:

P(X \leq 13) = 1 - P(X = 14) - P(X = 15) = 1 - 0.2252 - 0.7395 \approx 0.0353

Decision

Since the p-value (0.0353) is less than the significance level (0.05), we reject the null hypothesis.

Interpretation

There is sufficient evidence to conclude that the politician is overestimating the support for EU membership. While 86.7% support in the sample is still very high, it’s significantly lower than the politician’s claim of 98%. Under the assumption that true support is 98%, the probability of observing 13 or fewer supporters in a sample of 15 people would be only about 3.53%.

Note on Complement Rule

This problem demonstrates the utility of the complement rule in probability calculations. Instead of calculating probabilities for outcomes 0 through 13 (14 calculations), we only needed to calculate probabilities for outcomes 14 and 15 (2 calculations). This is particularly efficient when:

  1. We have a large number of outcomes in the “tail” we’re interested in
  2. The probability is concentrated in the other tail (here, high values due to p_0 = 0.98)

20.3 Problem Solutions – part 2: the binomial tests (two-tailed/sided tests (*))

(…)

20.4 Testing OLS Regression Parameter Significance Using R (*)

(…)