flowchart TD Start[Problem: Count Possible Outcomes] --> Q1{"Are we counting outcomes\nthat happen in sequence?"} Q1 -->|Yes| M1[Multiplication Rule:\nMultiply choices for each step] Q1 -->|No| Q2{"Are we counting different\nways to achieve same result?"} M1 --> ME1[Examples of Sequential Choices] ME1 --> MC1["Password: letter then number\n26 letters × 10 numbers\n= 260 possibilities"] MC1 --> MC2["Travel: bus then train\n3 bus routes × 2 train routes\n= 6 possible journeys"] Q2 -->|Yes| Q3{"Do options overlap?"} Q3 -->|No| A1[Simple Addition Rule:\nAdd all possibilities] Q3 -->|Yes| A2["Extended Addition Rule:\nAdd - Overlap"] A1 --> AE1[Examples of Non-Overlapping Options] AE1 --> AC1["Coin toss: H or T\n1 + 1 = 2 outcomes"] AC1 --> AC2["License type: Car or Motorcycle\n100 + 50 = 150 types"] A2 --> AE2[Examples of Overlapping Options] AE2 --> AC3["Students in Sports or Music:\n45 + 35 - 15 in both\n= 65 students"] classDef start fill:#2d5a8c,stroke:#333,color:#fff,stroke-width:2px classDef question fill:#d4426e,stroke:#333,color:#fff,stroke-width:2px classDef rule fill:#156b45,stroke:#333,color:#fff,stroke-width:2px classDef example fill:#4a4a4a,stroke:#333,color:#fff class Start start class Q1,Q2,Q3 question class M1,A1,A2 rule class ME1,AE1,AE2,MC1,MC2,AC1,AC2,AC3 example
17 Introduction to (Discrete) Probability
17.1 Probability: Preliminary Concepts
Imagine you’re trying to decide whether to bring an umbrella to class tomorrow. You check the weather forecast, which says there’s a 30% chance of rain. But what does this number really mean?
This is where probability comes in - it’s a mathematical way to measure how likely something is to happen.
A probability represents the likelihood or chance of an event occurring, expressed as a number between 0 and 1 (or as a percentage between 0% and 100%).
Before we dive into probability theory, let’s establish some foundational concepts that we’ll use throughout this course.
Basic Set Concepts
Before we can understand probability, we need to grasp some fundamental concepts from set theory. A set is simply a collection of distinct objects.
A set can be defined by:
- Listing all elements: A = \{1, 2, 3\}
- Describing a property: B = \{\text{x | x is a positive integer less than 4}\}
The empty set \emptyset contains no elements.
If A and B are sets:
- If A is a subset of B, we write A \subseteq B
- If x is an element of A, we write x \in A
For example, if B = \{1, 2, 3\}:
- \{1, 2\} is a subset of B (written \{1, 2\} \subseteq B)
- 1 is an element of B (written 1 \in B)
The proper subset notation uses a strict subset symbol. If A is a proper subset of B, we write:
A \subset B
This means that A is a subset of B AND A \neq B (A is not equal to B).
In contrast, A \subseteq B allows for the possibility that A = B.
A set is a fundamental mathematical concept - it’s a collection of distinct objects where order doesn’t matter and duplicates are not allowed. In other words, each element either belongs to the set or it doesn’t, with no concept of “how many times” it belongs.
Formally, if x \in A (meaning x is an element of set A), then adding another copy of x has no effect on A. This gives us identities like:
\{1, 2, 2, 3\} = \{1, 2, 3\} = \{3, 1, 2\}
This distinguishes sets from other mathematical collections:
Lists/Sequences: Order matters and duplicates are allowed
- [1, 2, 2, 3] ≠ [1, 2, 3]
- [1, 2, 3] ≠ [3, 2, 1]
Multisets: Order doesn’t matter but duplicates are allowed
- {1, 2, 2, 3}ₘ ≠ {1, 2, 3}ₘ
- {1, 2, 2, 3}ₘ = {3, 2, 1, 2}ₘ
This unique property of sets - that membership is binary (an element either belongs or doesn’t) - makes them particularly useful in mathematics for describing collections where we only care about whether something is present, not how many times it appears or in what order.
Set Operations
Basic set (events) operations (given two sets A and B):
- Union (A \cup B): Elements in either A OR B (or both)
- Intersection (A \cap B): Elements in BOTH A AND B
- Complement (A^c or A^{'}): Elements NOT in A
- Difference (A \setminus B): Elements in A but NOT in B
These operations follow important laws like:
(A \cup B)^c = A^c \cap B^c (DeMorgan’s Law)
(A \cup (B \cap C) = (A \cup B) \cap (A \cup C) (Distributive Law)
Sets and the associated operations are easy to visualize in terms of Venn diagrams, as illustrated in the figure below:
Examples of Venn diagrams:
The shaded region is S \cap T.
The shaded region is S \cup T.
The shaded region is S \cap T^c.
Here, T \subset S. The shaded region is the complement of S.
The sets S, T, and U are disjoint.
The sets S, T, and U form a partition of the universal set \Omega.
The Universal Set (often denoted as \Omega, U, or S):
- In Set Theory:
- Set containing all elements in a given context
- All other sets are its subsets
- Complement of set A is A' = \Omega - A
- In Probability:
- Called the sample space S or \Omega
- Contains all possible outcomes
- Has probability P(\Omega) = 1
Key Properties:
- A \subseteq \Omega
- A \cup A' = \Omega
- A \cap A' = \emptyset
- \Omega' = \emptyset
- \emptyset' = \Omega
Examples:
- Die roll: \Omega = \{1,2,3,4,5,6\}
- Coin flip: \Omega = \{H,T\}
Set theory provides the mathematical framework for probability theory. Here are the key parallels:
Set Theory | Probability Theory | Description |
---|---|---|
\Omega (Universal set) | Sample space (S) | All possible outcomes |
x \in A (Element) | Outcome | Single result |
A \subseteq \Omega (Subset) | Event | Collection of outcomes |
\emptyset (Empty set) | Impossible event | Cannot occur (P(\emptyset) = 0) |
\Omega (Universal set) | Certain event | Must occur (P(\Omega) = 1) |
A \cup B (Union) | Either A OR B | P(A \cup B) = P(A) + P(B) - P(A \cap B) |
A \cap B (Intersection) | Both A AND B | P(A \cap B) = P(A)P(B) (if independent) |
A' (Complement) | Not A | P(A') = 1 - P(A) |
A \cap B = \emptyset | Mutually exclusive | P(A \cap B) = 0 |
In set theory, we denote cardinality (the number of elements in a set) using vertical bars: |A|
Key points:
- |A| means “number of elements in set A”
- For a finite set like A = \{1, 2, 3\}, we have |A| = 3
- Empty set has cardinality zero: |\emptyset| = 0
- For two sets A and B:
- Union (no overlap): |A \cup B| = |A| + |B|
- Union (with overlap): |A \cup B| = |A| + |B| - |A \cap B|
- Cartesian product: |A \times B| = |A| \times |B|
Example:
If A = \{\spadesuit, \clubsuit, \heartsuit, \diamondsuit\} and B = \{K, Q, J\}, then:
- |A| = 4
- |B| = 3
- |A \times B| = 12 (all possible combinations)
The cardinality of a set is denoted by |A| or #A. Here are the calculations:
|\{apple, orange, watermelon\}| = 3 (Each element is distinct)
|\{1, 1, 1, 1, 1\}| = 1 (In a set, duplicates are counted only once)
|[0, 1]| = \aleph_1 (This is an uncountably infinite interval of real numbers)
|\{1, 2, 3, \cdots\}| = \aleph_0 (This is countably infinite)
|\{\emptyset, \{1\}, \{2\}, \{1, 2\}\}| = 4 (Each element is a distinct set)
|\{\emptyset, \{1\}, \{1, 1\}, \{1, 1, 1\}, \cdots\}| = 2 (After removing duplicates: \{\emptyset, \{1\}\} since \{1\} = \{1, 1\} = \{1, 1, 1\} = \cdots)
Understanding Set Relations: Elements vs Subsets (*)
The difference between an element belonging to a set and one set being a subset of another.
The “Belongs To” Relationship (\in)
When we say an element belongs to a set (written as x \in A), we’re describing membership of a single item in a collection. Think of a classroom: each individual student belongs to (is a member of) the class. They are elements of the set “class.”
Consider a deck of cards and let H be the set of all hearts:
H = \{2♥, 3♥, 4♥, 5♥, 6♥, 7♥, 8♥, 9♥, 10♥, J♥, Q♥, K♥, A♥\}
We can say:
- A♥ \in H (true, because the ace of hearts is one of the hearts)
- K♠ \notin H (false, because the king of spades is not a heart)
- \{A♥\} \notin H (false, this is a set containing the ace of hearts, not the card itself)
The “Is Contained In” Relationship (\subseteq)
A subset relationship (written as A \subseteq B) describes when one set is entirely contained within another set. Every element of the smaller set must appear in the larger set. This is different from set membership (\in), which describes when a single element belongs to a set.
To understand the distinction, let’s look at some examples:
Consider the following sets:
- A = \{1, 2\}
- B = \{1, 2, 3, 4\}
- C = \{1\}
For set membership (\in):
- 1 \in A (the number 1 is an element of set A)
- \{1\} \notin A (the set containing 1 is not an element of A)
- 2 \in B (the number 2 is an element of B)
For subset relationships (\subseteq):
- A \subseteq B (all elements of A are in B)
- C \subseteq A (all elements of C are in A)
- \{1\} \subseteq A (the set containing 1 is a subset of A)
A key insight is that while 1 \in A is true (1 is an element of A), \{1\} \in A is false (the set containing 1 is not an element of A). However, \{1\} \subseteq A is true (the set containing 1 is a subset of A).
Think of it this way: membership (\in) asks “Is this single thing in the set?” while subset (\subseteq) asks “Is every element of this smaller set found in the larger set?”
Another helpful example is with the empty set \emptyset:
- \emptyset \subseteq A for any set A (the empty set is a subset of every set)
- But \emptyset \notin A unless A specifically contains the empty set as an element
Exercise (https://www.alextsun.com/files/Prob_Stat_for_CS_Book.pdf). Using the given sets:
- A = \{1, 3\}
- B = \{3, 1\}
- C = \{1, 2\}
- D = \{\emptyset, \{1\}, \{2\}, \{1, 2\}, 1, 2\}
Determine whether the following are true or false:
- 1 \in A : TRUE (1 is an element of A)
- 1 \subseteq A : FALSE (1 is not a set, so subset relation doesn’t apply)
- \{1\} \subseteq A : TRUE (every element of the set {1} is an element of A)
- \{1\} \in A : FALSE (A doesn’t contain any sets as elements)
- 3 \notin C : TRUE (3 is not an element of C)
- A \in B : FALSE (B doesn’t contain any sets as elements)
- A \subseteq B : TRUE (A and B contain the same elements)
- C \in D : TRUE (the set {1,2} appears in D, but not the set C itself)
- C \subseteq D : TRUE (all elements of C (1 and 2) are also elements of D)
- \emptyset \in D : TRUE (empty set is listed as an element of D)
- \emptyset \subseteq D : TRUE (True, by definition, the empty set is a subset of any set. This is because if this were not the case, there would have to be an element of \emptyset which was not in D. But there are no elements in \emptyset, so the statement is true.)
- A = B : TRUE (they contain the same elements)
- \emptyset \subseteq \emptyset : TRUE (empty set is a subset of itself; the empty set is a subset of any set)
- \emptyset \in \emptyset : FALSE (empty set contains no elements)
17.2 Set Theory and Power Sets (Event Space)
The power set of a set, denoted as \mathcal{F}(S) or 2^{|S|}, is the set of all possible subsets of S, including the empty set and S itself.
This concept is crucial in probability theory because it helps us understand the relationship between the sample space S (all possible outcomes) and the event space (all possible events (i.e. all possible subsets of S) we might want to consider).
Let’s explore this with a simple example. Consider flipping a single coin where: S = \{H, T\} (our sample space)
The power set would be:
\mathcal{F}(S) = \{\emptyset, \{H\}, \{T\}, \{H,T\}\}
Each element in the power set represents a possible event. For instance:
- \emptyset: The impossible event (e.g., the coin landing neither heads nor tails)
- \{H\}: The event of getting heads
- \{T\}: The event of getting tails
- \{H,T\}: The certain event (the coin must land either heads or tails)
For a set with n elements, its power set will have 2^n elements. This is because for each element, we have two choices: include it or not include it in a subset.
17.3 Counting Rules in Probability: The Power of AND & OR
A fundamental challenge in probability is counting possible outcomes. Two key rules help us solve these problems:
The Multiplication Rule for Independent Events (“AND” Situations)
When we need a sequence of independent choices where we must make ALL choices, we multiply the number of possibilities for each choice. This principle applies when we need option A AND option B AND option C, etc.
For example, consider creating a password with exactly three characters in this order:
- First character must be a letter (26 choices)
- Second character must be a digit (10 choices)
- Third character must be a symbol (@, #, $, or % - so 4 choices)
Total possible passwords = 26 × 10 × 4 = 1,040
This is like filling three slots where each slot has its own set of valid options. Each new requirement multiplies our total possibilities.
The Addition Rule for Mutually Exclusive Events (“OR” Situations)
When there are multiple valid ways to achieve a goal, and we can use ANY ONE of these ways, we add the number of possibilities. This applies when we accept option A OR option B OR option C, etc.
For example, if a password must be EITHER:
- A 3-letter word (26³ possibilities) OR
- A 4-digit number (10⁴ possibilities)
Total possibilities = 26³ + 10⁴ = 17,576 + 10,000 = 27,576
Think of this as having separate paths to success - we count how many ways each path offers and sum them up.
Combining the Rules
Many real problems require both multiplication and addition.
Example 1. For instance, if a password must be EITHER:
- A letter followed by two digits (26 × 10 × 10 possibilities) OR
- Three symbols (4 × 4 × 4 possibilities)
Total = (26 × 10 × 10) + (4 × 4 × 4) = 2,600 + 64 = 2,664
Understanding when to multiply (AND situations) versus when to add (OR situations) is key to solving counting problems correctly.
Example 2. Calvin wants to reach Milwaukee and has these options:
- First leg (home → Chicago): 3 bus services OR 2 train services
- Second leg (Chicago → Milwaukee): 2 bus services OR 3 train services
To solve this:
- First leg options = 3 + 2 = 5 ways
- Second leg options = 2 + 3 = 5 ways
- Total routes = 5 × 5 = 25 possibilities
Why multiply at the end? Because for EACH way of reaching Chicago, Calvin can use ANY of the ways to reach Milwaukee. This creates 25 unique combinations like:
- Bus 1 → Bus 1
- Bus 1 → Train 1
- Bus 2 → Bus 2 …and so on.
The key is recognizing whether you’re dealing with sequential choices (multiply) or alternative options (add) at each step. Master this distinction, and you’ll solve complex counting problems with ease.
flowchart TD subgraph Addition_Rules[Addition Rule Examples] direction TB subgraph Exclusive[Mutually Exclusive Example] direction TB A1["Coin Flip"] --> AH((Heads)) A1 --> AT((Tails)) AT --> AR["Total = 1 + 1 = 2\nNo overlap possible"] AH --> AR end subgraph Overlapping[Overlapping Sets Example] direction TB O1["Students in\nClubs"] --> OS["Science Club\n25 students"] & OA["Art Club\n20 students"] OS & OA --> OI["Both Clubs\n8 students"] OI --> OT["Total = 25 + 20 - 8\n= 37 students"] end end classDef default fill:#f5f5f5,stroke:#333,color:#000 classDef set fill:#e6e6e6,stroke:#333,color:#000 classDef result fill:#d9d9d9,stroke:#333,color:#000 classDef option fill:#ffffff,stroke:#333,color:#000 classDef example fill:#f0f0f0,stroke:#333,color:#000 class A1,O1 set class AR,OT result class AH,AT option class Exclusive,Overlapping example
flowchart TD subgraph Multiplication_Rule[Multiplication Rule Examples] direction TB M1["Choose Breakfast"] --> MA["Drink\n(3 options)"] & MB["Food\n(2 options)"] MA --> MA1((Coffee)) & MA2((Tea)) & MA3((Juice)) MB --> MB1((Toast)) & MB2((Cereal)) MA1 & MA2 & MA3 --> MR["Total Combinations:\n3 drinks × 2 foods\n= 6 possible breakfasts"] MB1 & MB2 --> MR subgraph Tree_Example[Tree Diagram] direction TB T1["PIN First Digit\n(0-9)"] --> T2["Second Digit\n(0-9)"] T2 --> T3["10 × 10 = 100\ntotal combinations"] end end classDef default fill:#f5f5f5,stroke:#333,color:#000 classDef set fill:#e6e6e6,stroke:#333,color:#000 classDef result fill:#d9d9d9,stroke:#333,color:#000 classDef option fill:#ffffff,stroke:#333,color:#000 classDef example fill:#f0f0f0,stroke:#333,color:#000 class MA,MB set class MR,T3 result class MA1,MA2,MA3,MB1,MB2 option class Tree_Example example
The Inclusion-Exclusion Principle states that for two sets A and B:
|A \cup B| = |A| + |B| - |A \cap B|
This means: The size of their union equals the sum of their individual sizes, minus their intersection (to avoid double counting shared elements).
For three sets A, B, and C, the principle extends to:
|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |B \cap C| - |A \cap C| + |A \cap B \cap C|
This pattern continues for more sets, alternating between adding and subtracting intersections of increasing size.
A simple example:
- Set A: Students who play soccer (20 students)
- Set B: Students who play basketball (15 students)
- 8 students play both sports
- Total students in either sport = 20 + 15 - 8 = 27 students
The principle is essential in probability theory, combinatorics, and set theory. It helps us correctly count elements when sets overlap, avoiding the common error of double-counting shared elements.
17.4 Probability Theory: Basic Concepts and Rules
Probability theory provides a rigorous foundation for quantifying uncertainty and analyzing random phenomena.
Random Experiments
A random experiment is any procedure that has a well-defined set of possible outcomes but whose specific result cannot be predicted with certainty.
Properties:
- Repeatable under identical conditions
- Known possible outcomes
- Unpredictable specific results
Sample Space (S)
- Complete set of all possible outcomes of a random experiment
- Denoted by S (or \Omega)
- Properties:
- Mutually exclusive outcomes
- Collectively exhaustive
Examples:
- Coin flip: S = \{H, T\}
- Die roll: S = \{1, 2, 3, 4, 5, 6\}
The Event Space: What Can Happen in an Experiment
Events are subsets of the sample space S. This means we can use standard set operations to work with them in a precise, mathematical way.
The event space \mathcal{F} is a collection of all events (outcomes or sets of outcomes) that we can assign probabilities to in an experiment. It must follow three fundamental rules:
- Complete Space Rule
- The entire sample space S must be in \mathcal{F}
- This means all possible outcomes together form a valid event
- Complement Rule
- If event A is in \mathcal{F}, then “not A” (written as A^c) must also be in \mathcal{F}
- Example: If “getting heads” is an event, “not getting heads” must also be an event
- Union Rule
- If we have any sequence of events A_1, A_2, ... in \mathcal{F}, their union must also be in \mathcal{F}
- This means we can combine valid events to form new valid events
In probability, we encounter two fundamentally different types of random events: those we can count (discrete) and those we can measure (continuous). This distinction shapes how we calculate and interpret probabilities.
Discrete Probability
What: Events that can be counted with whole numbers - Like counting marbles, rolling dice, or flipping coins - Has “gaps” between possible values
Key Examples:
- Rolling a die
- Possible outcomes: 1, 2, 3, 4, 5, or 6
- Nothing in between (can’t roll a 2.5)
- Can say: P(\text{rolling a 6}) = \frac{1}{6}
- Number of customers per hour
- Could be 0, 1, 2, 3, …
- Can’t have 2.7 customers
- Can say: P(\text{exactly 5 customers}) = 0.1
Continuous Probability
What: Events measured on a continuous scale - Like measuring height, time, or temperature - Values flow smoothly with no gaps
Key Examples:
- Person’s height
- Could be 170cm, 170.1cm, 170.11cm, …
- Can measure with increasing precision
- Must use ranges: P(170 \leq \text{height} \leq 171)
- Time until next bus arrives
- Could be 5 mins, 5.1 mins, 5.01 mins, …
- Infinitely divisible
- Must use ranges: P(\text{waiting time} \leq 10 \text{ mins})
Critical Differences
- Individual Values
- Discrete: Can have positive probability
- P(\text{rolling a 6}) = \frac{1}{6} > 0
- Continuous: Always have zero probability
- P(\text{height} = 170.000...) = 0
- Discrete: Can have positive probability
- How We Calculate
- Discrete: Can sum individual probabilities
- Continuous: Must use ranges and integrals
Real-World Application
Think about a pizza delivery:
- Discrete: Number of toppings (1, 2, 3, …)
- Continuous: Delivery time (15.7 minutes, 15.73 minutes, …)
Why Understanding This Matters
- Helps choose appropriate probability tools
- Guides how we collect and analyze data
- Determines how we express uncertainty
- Shapes how we make predictions
This foundation helps us tackle real-world probability problems with the right approach!
Let’s examine a simple coin flip:
- Sample space: S = \{H, T\} (Heads or Tails)
- The complete event space: \mathcal{F} = \{\emptyset, \{H\}, \{T\}, \{H,T\}\}
- \emptyset : impossible event (no outcomes)
- \{H\} : getting Heads
- \{T\} : getting Tails
- \{H,T\} : getting either Heads or Tails
Verifying the rules:
- Rule 1: \{H,T\} (the sample space) is included
- Rule 2: For event \{H\}, its complement \{T\} is included
- Rule 3: The union of any events (like \{H\} \cup \{T\} = \{H,T\}) is included
There’s an important distinction between outcomes (also called simple events) and events:
An outcome or simple event is a single, indivisible result of an experiment. For example, getting heads on a single coin flip is an outcome.
An event is a set of outcomes - it can contain one outcome, multiple outcomes, or even no outcomes (the empty set). For example, “getting at least one head when flipping two coins” is an event containing multiple outcomes.
Let’s illustrate this with two coin flips where:
S = \{HH, HT, TH, TT\} (our sample space)
The power set (all possible events) would contain 2^4 = 16 events:
- \emptyset (impossible event)
- Single outcomes: \{HH\}, \{HT\}, \{TH\}, \{TT\}
- Pairs of outcomes: \{HH,HT\}, \{HH,TH\}, \{HH,TT\}, \{HT,TH\}, \{HT,TT\}, \{TH,TT\}
- Triples: \{HH,HT,TH\}, \{HH,HT,TT\}, \{HH,TH,TT\}, \{HT,TH,TT\}
- Complete sample space: \{HH,HT,TH,TT\}
- Simple Events: Single outcomes
- Compound Events: Multiple outcomes
- Sure (or Certain) Event: Sample space S
- Impossible Event: Empty set \emptyset
Probability Measure and Probability Axioms: Assigning Numbers to Events
A probability measure P is a way to quantify how likely events are to occur. It takes any event from our event space \mathcal{F} and assigns it a number between 0 and 1.
This assignment follows three essential rules/axioms. These axioms, introduced by Andrey Kolmogorov in 1933, serve as the foundation for all probability calculations:
- Non-Negativity Rule
- For any event A, its probability must be at least 0: P(A) \geq 0
- We can never have a negative probability
- Example: If we roll a die, the probability of getting a 6 is \frac{1}{6} (it cannot be negative)
- Total Probability Rule
- The probability of all possible outcomes must equal 1: P(S) = 1
- Something must happen - the probabilities of all possibilities add up to 100%
- Example: For a fair coin, P(\text{heads}) + P(\text{tails}) = 0.5 + 0.5 = 1
- Addition Rule for Non-Overlapping Events
- If events cannot happen together (they’re “disjoint”), the probability of their union equals the sum of their individual probabilities
- Written formally: P(A_1 \cup A_2 \cup ...) = P(A_1) + P(A_2) + ...
- Example: In drawing a card, P(\text{getting ace}) = P(\text{ace of hearts}) + P(\text{ace of diamonds}) + P(\text{ace of clubs}) + P(\text{ace of spades})
These rules/axioms ensure that our probability assignments make logical sense and match our intuitive understanding of chance and likelihood.
Let’s clarify some closely related but distinct concepts:
Probabilistic Model consists of two fundamental elements:
- A sample space \Omega (or S): the set of all possible outcomes
- A probability law that assigns probabilities to events (subsets of \Omega)
Probability Measure (P):
- The formal mathematical function that maps events to numbers in [0,1]
- Must satisfy the three axioms (non-negativity, normalization, additivity)
- Example: P(A) gives the probability of event A occurring
Probability Distribution:
- The specific assignment of probabilities to all possible outcomes
- Describes how probability is distributed across the sample space
- Example: For a fair die, {1: 1/6, 2: 1/6, 3: 1/6, 4: 1/6, 5: 1/6, 6: 1/6}
Probability Law:
- Often used as a synonym for probability distribution
- Can also refer to the underlying rule generating the probabilities
- Example: “Each face of a fair die has equal probability”
In practice, these terms are interrelated: The probability measure implements the probability law, which determines the probability distribution, all within the context of a probabilistic model.
In probability theory and statistics, being able to visualize sample spaces is crucial for understanding possible outcomes and their relationships. We’ll explore three main approaches to visualizing sample spaces:
- Venn Diagrams
- Tree Diagrams
- Grid/Matrix Diagrams
Venn Diagrams
Venn diagrams provide a powerful visual tool for understanding sample spaces.
- A Venn diagram is a graphical representation of sets and their relationships using (overlapping or disjoint) circles or other shapes.
- Think of each shape in a Venn diagram as a container that holds items with specific characteristics. Where these shapes overlap, we find items that share characteristics of multiple groups.
In probability theory, our sample space (usually denoted by Ω or S) represents all possible outcomes of an experiment. When we draw a Venn diagram, the rectangular frame represents this entire sample space (a universal set), with a probability of 1. Any event is then a subset of this space.
Tree Diagrams
Tree diagrams are particularly useful for visualizing sequential events and their outcomes. Here’s a tree diagram showing a simple probability experiment: We toss a fair coin twice.
graph LR Start[Start] --> H1[H] Start --> T1[T] H1 --> H2[H] H1 --> T2[T] T1 --> H3[H] T1 --> T3[T] H2 --> HH([HH: 1/4]) T2 --> HT([HT: 1/4]) H3 --> TH([TH: 1/4]) T3 --> TT([TT: 1/4]) linkStyle 0,2,3 stroke:#1e88e5,stroke-width:2px linkStyle 4,5 stroke:#ff5252,stroke-width:2px linkStyle 1 stroke:#ff5252,stroke-width:2px style Start fill:#f5f5f5,stroke:#333,stroke-width:2px style H1 fill:#bbdefb,stroke:#1e88e5,stroke-width:2px style T1 fill:#ffcdd2,stroke:#ff5252,stroke-width:2px style H2 fill:#bbdefb,stroke:#1e88e5,stroke-width:2px style T2 fill:#ffcdd2,stroke:#ff5252,stroke-width:2px style H3 fill:#bbdefb,stroke:#1e88e5,stroke-width:2px style T3 fill:#ffcdd2,stroke:#ff5252,stroke-width:2px style HH fill:#f5f5f5,stroke:#333,stroke-width:2px style HT fill:#f5f5f5,stroke:#333,stroke-width:2px style TH fill:#f5f5f5,stroke:#333,stroke-width:2px style TT fill:#f5f5f5,stroke:#333,stroke-width:2px
Grid/Matrix Diagrams
Grid diagrams are excellent for showing combinations of events.
Scenario: We have 7 balls in the bag:
- 4 red balls (R₁, R₂, R₃, R₄)
- 3 black balls (B₁, B₂, B₃)
- We’ll draw 2 balls without replacement
Let’s visualize the entire sample space using a grid where each cell represents selecting two balls in order (first draw → columns, second draw ↓ rows):
First Draw → | R₁ | R₂ | R₃ | R₄ | B₁ | B₂ | B₃ |
---|---|---|---|---|---|---|---|
Second Draw ↓ | |||||||
R₁ | X | ⚫ | ⚫ | ⚫ | ⚪ | ⚪ | ⚪ |
R₂ | ⚫ | X | ⚫ | ⚫ | ⚪ | ⚪ | ⚪ |
R₃ | ⚫ | ⚫ | X | ⚫ | ⚪ | ⚪ | ⚪ |
R₄ | ⚫ | ⚫ | ⚫ | X | ⚪ | ⚪ | ⚪ |
B₁ | ⚪ | ⚪ | ⚪ | ⚪ | X | ⚫ | ⚫ |
B₂ | ⚪ | ⚪ | ⚪ | ⚪ | ⚫ | X | ⚫ |
B₃ | ⚪ | ⚪ | ⚪ | ⚪ | ⚫ | ⚫ | X |
Where:
- X: Impossible (same ball twice)
- ⚫: Both same color (both red in upper-left, both black in lower-right)
- ⚪: Different colors (red-black or black-red)
From this grid:
- Both red = 12 outcomes (⚫ in upper-left quadrant)
- Both black = 6 outcomes (⚫ in lower-right quadrant)
- Red then black = 12 outcomes (⚪ in lower-left quadrant)
- Black then red = 12 outcomes (⚪ in upper-right quadrant)
- Total possible outcomes = 42 (remove seven diagonal X’s from 7 × 7 grid)
Total count verification: 12 + 6 + 12 + 12 = 42 outcomes
Note: Each outcome is determined by reading first draw (column) then second draw (row).
Grid/Matrix Diagrams with Unordered Pairs
For this modified scenario where order doesn’t matter, we need to adjust our counting since (R₁,B₁) and (B₁,R₁) would be considered the same outcome.
Scenario: We have 7 balls in the bag:
- 4 red balls (R₁, R₂, R₃, R₄)
- 3 black balls (B₁, B₂, B₃)
- We’ll draw 2 balls without replacement
- Order does NOT matter
Because order doesn’t matter, we only need to look at half of the grid, excluding the diagonal.
First Draw → | R₁ | R₂ | R₃ | R₄ | B₁ | B₂ | B₃ |
---|---|---|---|---|---|---|---|
Second Draw ↓ | |||||||
R₁ | X | ⚫ | ⚫ | ⚫ | ⚪ | ⚪ | ⚪ |
R₂ | – | X | ⚫ | ⚫ | ⚪ | ⚪ | ⚪ |
R₃ | – | – | X | ⚫ | ⚪ | ⚪ | ⚪ |
R₄ | – | – | – | X | ⚪ | ⚪ | ⚪ |
B₁ | – | – | – | – | X | ⚫ | ⚫ |
B₂ | – | – | – | – | – | X | ⚫ |
B₃ | – | – | – | – | – | – | X |
Where:
- X: Impossible (same ball twice)
- ⚫: Both same color
- ⚪: Different colors
- –: Redundant (already counted in upper half)
From this grid:
- Both red = 6 outcomes (⚫ in upper-left quadrant)
- Both black = 3 outcomes (⚫ in lower-right quadrant)
- One red and one black = 12 outcomes (⚪ only counted once)
- Total possible outcomes = 21 (half of the ordered outcomes: 42 ÷ 2)
Total count verification: 6 + 3 + 12 = 21 unordered outcomes
Note: Each unordered pair {R₁,B₁} is counted only once, whereas in the ordered scenario we counted both (R₁,B₁) and (B₁,R₁).
Discrete Sample Spaces & Probability
When we analyze random events like coin flips or dice rolls, we need a way to list all possible outcomes. This is where discrete sample spaces come in. Let’s break this down step by step:
Discrete Sample Spaces (S or \Omega)
Think of a sample space as a container holding all possible outcomes of a random event/experiment. In discrete sample spaces, we can count these outcomes one by one, like counting marbles in a bag. We write it as:
- For finite events: S = \{s_1, s_2, ..., s_n\} [classical (‘naive’) probability]
- For infinite but countable events: S = \{s_1, s_2, ...\}
Three Essential Rules
- Must include everything possible (no missing outcomes)
- No overlap between outcomes (each is unique)
- Must be countable (you can list them out)
Examples
1. Equally Likely Outcomes (uniform probability distribution)
These are scenarios where each basic outcome has the same probability:
- Fair die: S = \{1, 2, 3, 4, 5, 6\}, each with probability \frac{1}{6}
- Fair deck: S = \{\text{52 cards}\}, each with probability \frac{1}{52}
- Fair coin: S = \{\text{H}, \text{T}\}, each with probability \frac{1}{2}
2. Events with Different Probabilities
Some (discrete) cases:
Binomial Scenarios (Counting Successes)
- Example: Number of successes in n trials, each with probability p
- Sample Space: S = \{0, 1, 2, ..., n\}
- Outcomes have different probabilities based on:
- Number of ways to get k successes (\binom{n}{k})
- Probability of each arrangement (p^k(1-p)^{n-k})
Geometric Scenarios (Waiting for Success)
- Example: Number of trials until first success, probability p per trial
- Sample Space: S = \{1, 2, 3, ...\}
- Probability decreases with number of trials
- P(\text{first success on trial }n) = (1-p)^{n-1}p
Note: The term “success” in probability simply means the outcome we’re tracking - it could be any event of interest. Each trial has two possible outcomes: success (probability p) or failure (probability 1-p).
How to Assign Probabilities
1. Equal Chances (Classical Probability) When all outcomes are equally likely:
P(\text{one outcome}) = \frac{1}{\text{total outcomes}}
Example: Rolling a fair die
- Probability of rolling a 4 = \frac{1}{6}
2. Mathematical Models (Probability Distribution Functions) For more complex situations, we use specific formulas (functions):
- Multiple trials (Binomial): P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}
- First success (Geometric): P(X=k) = p(1-p)^{k-1}
3. Using Data (Empirical/Experimental Probability) When we have actual observations:
P(\text{outcome}) = \frac{\text{times outcome occurred}}{\text{total observations}}
Example: If you flip a coin 100 times and get 53 heads: P(\text{heads}) = \frac{53}{100} = 0.53
Classical (“Naive”) Probability: Why Single Outcomes Have Equal Probabilities
Let’s prove this step by step:
- Start with n equally likely outcomes: s_1, s_2, ..., s_n
- We know the total probability must be 1 (rule 2 above)
- Call the probability of each outcome p
- Since outcomes are equally likely, each has the same probability p
- Adding up all outcomes: p + p + ... + p (n times) = 1
- Therefore: np = 1
- Solving for p: p = \frac{1}{n}
This gives us the following rule under the assumption of equally likely outcomes:
P(\text{single outcome}) = \frac{1}{\text{number of possible outcomes}}
Important Considerations
Classical probability (equal likelihood) is a special case
Many real-world phenomena follow specific probability distributions (a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment.)
The type of probability assignment depends on the context:
- Physical symmetry → Classical probability
- Repeated independent trials → Binomial distribution
- Rare events → Poisson distribution
- Waiting times → Geometric distribution
- (…)
All these cases work within discrete sample spaces
The probabilities must always sum to 1 over the entire sample space
This framework helps us systematically study discrete random variables and their probabilities, whether they follow uniform or non-uniform distributions.
There is no single, universal formula for calculating probabilities across all probability spaces and situations.
Different Types of Probability Spaces
- Classical (Finite, Equally Likely)
- Only here we can use: P(A) = \frac{\text{favorable outcomes}}{\text{total outcomes}}
- Limited to finite, equally likely cases
- General Discrete
- Must specify individual probabilities
- Example: Loaded die needs experimental/empirical determination
- Sum of probabilities must equal 1
- Continuous
- Uses calculus and density functions
- Probabilities found by integration
- Example: P(a \leq X \leq b) = \int_a^b f(x)dx
- No universal formula for density function
- Mixed/Hybrid
- Combines discrete and continuous elements
- Different methods needed for different parts
Why No Universal Formula?
- Different types of randomness need different mathematical tools
- Nature of outcomes (discrete/continuous) affects calculation method
- Prior knowledge or assumptions shape probability calculation
- Some probabilities must be found empirically (frequentist/statistical probability) rather than calculated
Frequentist probability defines probability as the long-term relative frequency of an event’s occurrence in repeated trials under identical conditions:
P(A) = \lim_{n \to \infty} \frac{\text{number of times A occurs}}{n}
The Law of Large Numbers states that as we increase the number of trials, the observed frequency converges to the true probability:
- If true probability of heads is 0.5
- In 10 flips: might get 7 heads (frequency = 0.7)
- In 1000 flips: might get 495 heads (frequency ≈ 0.495)
- As trials → ∞, frequency → 0.5
Key Characteristics:
- Requires repeatable experiments under identical conditions
- Objective approach - probability viewed as physical property
- Cannot handle one-time events
- Foundation for classical statistical inference
Limitation: We can never perform infinite trials, so we estimate probabilities from large but finite samples.
Classical probability, also known as Laplace probability, applies when all outcomes in a finite sample space are equally likely to occur. This framework, developed by Pierre-Simon Laplace, provides a simple yet powerful way to calculate probabilities when symmetry exists.
Mathematical Foundation
For any event A in sample space S, the classical probability is calculated as:
P(A) = \frac{\text{number of favorable outcomes}}{\text{number of possible outcomes}} = \frac{|A|}{|S|}
This definition automatically satisfies Kolmogorov’s probability axioms:
- Non-negativity: P(A) \geq 0 for all A \subseteq S
- Normalization: P(S) = \frac{|S|}{|S|} = 1
- Additivity: For disjoint events A and B, P(A \cup B) = P(A) + P(B)
Key Requirements
Two essential conditions must be met:
- The sample space S must be finite
- Each elementary outcome s \in S must be equally likely, with probability P(\{s\}) = \frac{1}{|S|}
Common Applications
- Fair Dice
- Rolling a six: P(6) = \frac{1}{6}
- Rolling an even number: P(2,4,6) = \frac{3}{6} = \frac{1}{2}
- Rolling a number greater than 4: P(5,6) = \frac{2}{6} = \frac{1}{3}
- Playing Cards
- Drawing a heart: P(♥) = \frac{13}{52} = \frac{1}{4}
- Drawing a face card: P(J,Q,K) = \frac{12}{52} = \frac{3}{13}
- Drawing a red ace: P(\text{A♥,A♦}) = \frac{2}{52} = \frac{1}{26}
- Multiple Coin Flips
- Two fair coins: S = \{HH, HT, TH, TT\}
- P(\text{exactly one head}) = \frac{|\{HT,TH\}|}{|\{HH,HT,TH,TT\}|} = \frac{2}{4} = \frac{1}{2}
- P(\text{at least one head}) = \frac{|\{HH,HT,TH\}|}{|\{HH,HT,TH,TT\}|} = \frac{3}{4}
Limitations and Considerations
- Equally Likely Assumption
- This framework fails for biased coins, loaded dice, or any scenario where outcomes aren’t equally likely
- In such cases, we need empirical probability or other probability measures
- Finite Space Requirement
- Cannot directly apply to infinite sample spaces
- Requires modification for continuous probability spaces
- Symmetry Assessment
- Physical symmetry (as in fair dice) often suggests equal likelihood
- But physical symmetry alone doesn’t guarantee equal probabilities in practice
Connection to Other Probability Concepts
Classical probability serves as a foundation for understanding more complex probability concepts:
- Forms the basis for combinatorial probability
- Provides intuition for uniform distributions
- Helps in understanding probability density functions in continuous spaces
Note: While classical probability is intuitive and mathematically elegant, real-world applications often require more general probability frameworks to handle non-uniform probabilities and infinite sample spaces.
Classical (or ‘Naive’) Probability: The Equal-Likelihood Special Case in Discrete Sample Spaces
Classical probability applies to finite sample spaces where all outcomes are equally likely to occur. The probability formula is:
P(\text{event}) = \frac{\text{number of favorable outcomes}}{\text{total number of possible outcomes}}
Requirements:
- Finite sample space (finite number of outcomes)
- All outcomes equally likely
- Total probability sums to 1
Examples:
- Fair die roll:
- P(\text{rolling a 3}) = \frac{1}{6}
- P(\text{rolling an even number}) = \frac{3}{6} = \frac{1}{2}
- Drawing from a standard deck:
- P(\text{drawing an ace}) = \frac{4}{52} = \frac{1}{13}
- P(\text{drawing a heart}) = \frac{13}{52} = \frac{1}{4}
Key Limitation:
Classical probability fails when outcomes are not equally likely (e.g., loaded die) or when the sample space is infinite.
This is why it’s called “naive” probability - it assumes a simple, idealized situation where simple counting is sufficient.
Starting Assumptions
When developing classical probability theory, we begin with a probability experiment that has two key properties:
- The sample space S is finite, with cardinality |S| = n
- All elementary outcomes are equally likely
We can write our sample space explicitly as: S = \{s_1, s_2, ..., s_n\}
This leads us to classical (or Laplace) probability, which we’ll derive rigorously in the following section. This special case provides a foundation for understanding more complex probability scenarios and helps build crucial probabilistic intuition.
Core Axioms of Probability Theory
To derive classical probability, we start with the aforementioned key probability axioms. These form the mathematical foundation for all probability theory, including the classical or ‘naive’ probability:
- For any event A, P(A) \geq 0 (Non-negativity)
- P(S) = 1 (Total probability)
- For disjoint events A and B, P(A \cup B) = P(A) + P(B) (Additivity)
Deriving the Classical (‘Naive’) Probability Formula (Equal-Likelihood Case)
Starting Point
In a fair game or unbiased experiment where all outcomes are equally likely, we can derive the famous “number of favorable outcomes divided by total outcomes” formula.
The Setup
Consider a finite sample space S with n outcomes: \{s_1, s_2, ..., s_n\}
Equal-Likelihood Assumption:
- Each outcome has the same probability p
- Mathematically: P(\{s_1\}) = P(\{s_2\}) = ... = P(\{s_n\}) = p
Step-by-Step Derivation
Use Total Probability Axiom
- All probabilities must sum to 1
- For our n equally likely outcomes:
P(S) = P(\{s_1\}) + P(\{s_2\}) + ... + P(\{s_n\}) = 1 \underbrace{p + p + ... + p}_{n \text{ terms}} = 1 np = 1 p = \frac{1}{n} = \frac{1}{|S|}
Calculate Probability of Any Event A
- Let event A contain k outcomes
- By the addition rule for disjoint events: P(A) = \underbrace{p + p + ... + p}_{k \text{ terms}} P(A) = k \cdot \frac{1}{|S|} = \frac{k}{|S|} = \frac{|A|}{|S|}
The Classical Probability Formula P(A) = \frac{|A|}{|S|} = \frac{\text{number of favorable outcomes}}{\text{total number of possible outcomes}}
Examples to Illustrate
- Fair Die Roll
- S = \{1,2,3,4,5,6\}, so |S| = 6
- For getting an even number, A = \{2,4,6\}, so |A| = 3
- P(A) = \frac{3}{6} = \frac{1}{2}
- Drawing a Card
- |S| = 52 (total cards)
- For drawing a king, |A| = 4
- P(\text{king}) = \frac{4}{52} = \frac{1}{13}
Important Limitations
This formula only works when:
- Sample space is finite (we can count outcomes)
- All outcomes are equally likely
- Each outcome is distinct and well-defined
If any of these conditions fail (like with a loaded die), we need different methods to calculate probabilities.
Understanding the Result
This derivation reveals several profound insights about classical probability:
The familiar “counting formula” (‘Naive’ probability) isn’t just an intuitive rule - it follows necessarily from our axioms combined with the equal-likelihood assumption. When we say outcomes are equally likely, we’re forced mathematically to assign each elementary outcome a probability of \frac{1}{|S|}. This isn’t a choice but a requirement of the axioms.
For any event A, its probability is determined entirely by comparing two cardinalities: the size of the event (|A|) relative to the size of the sample space (|S|).
A fair coin toss experiment is defined by:
Sample Space:
\Omega = \{H, T\}
Event Space (collection of all possible events):
\mathcal{F} = \{\{H\}, \{T\}, \{H, T\}, \emptyset\}
Probability Measure:
- P(\{H\}) = P(\{T\}) = \frac{1}{2} (probability of heads or tails)
- P(\Omega) = P(\{H, T\}) = 1 (certainty)
- P(\emptyset) = 0 (impossible event)
Key Points:
- This is a simple probability space that satisfies the axioms:
- P(A) \geq 0 for all events A
- P(\Omega) = 1
- P(A \cup B) = P(A) + P(B) for disjoint events
- The event space \mathcal{F} includes:
- Individual outcomes: \{H\} and \{T\}
- The entire sample space: \{H, T\}
- The empty set: \emptyset
- The probabilities are equal (\frac{1}{2}) because it’s a fair coin
This example illustrates a complete probability space with its three components: sample space (\Omega), event space (\mathcal{F}), and probability measure (P).
Example Application
Let’s solidify this understanding by working through a concrete example. Consider rolling a fair six-sided die and finding P(\text{even number}):
- First, identify the sample space: S = \{1, 2, 3, 4, 5, 6\}, giving us |S| = 6
- Then, identify the event: A = \{2, 4, 6\}, giving us |A| = 3
- Apply the formula: P(\text{even number}) = \frac{|A|}{|S|} = \frac{3}{6} = \frac{1}{2}
Before we can correctly calculate probabilities in any discrete scenario, we must answer two fundamental questions:
- Does Order Matter?
The importance of order fundamentally changes how we count outcomes. Consider selecting two cards from a deck:
- If we’re playing poker, order doesn’t matter - getting an ace and then a king is the same hand as getting a king and then an ace.
- If we’re performing a magic trick where we need specific cards in sequence, order matters - getting an ace then a king is different from getting a king then an ace.
When order matters, we’re dealing with permutations. When order doesn’t matter, we’re dealing with combinations. This distinction dramatically affects the number of possible outcomes and, consequently, our probability calculations.
- Is Sampling With or Without Replacement?
After selecting an item, do we put it back before the next selection? This question fundamentally changes the probability structure:
- With replacement: Each selection has the same probability distribution as the first selection. Drawing a red ball and replacing it means the probability of drawing red on the next try remains unchanged.
- Without replacement: Each selection changes the probability distribution for subsequent selections. Drawing a red ball and not replacing it means there are fewer red balls available for the next draw.
These sampling schemes lead to different probability models:
- With replacement leads to independent events and often simpler calculations
- Without replacement leads to dependent events and requires conditional probability
The Key Principle
- ADD when events represent different ways (paths) to achieve the same outcome
- MULTIPLY when events must occur in sequence (one after another)
Example 1: Single Die Roll Consider events:
- A: “rolling an even number” = {2, 4, 6}
- B: “rolling a number > 4” = {5, 6}
P(A or B) requires ADDITION because we want any outcome satisfying either condition:
- P(A or B) = P(A) + P(B) - P(A and B)
- = 3/6 + 2/6 - 1/6 = 4/6
Example 2: Two Coin Flips
For P(at least one heads):
- ADD different successful paths: P(HT or TH or HH)
- = 1/4 + 1/4 + 1/4 = 3/4
For P(two heads):
- MULTIPLY along path: P(H) × P(H)
- = 1/2 × 1/2 = 1/4
Why This Works
- Addition combines different ways to succeed
- Multiplication reflects narrowing down possibilities with each sequential requirement
graph LR Start[Start] --> H1[H] Start --> T1[T] H1 --> H2[H] H1 --> T2[T] T1 --> H3[H] T1 --> T3[T]
Understanding Classical Probability Through the Urn Example
In our urn with 3 green and 2 red balls:
- P(\text{green}) = \frac{3}{5}
- P(\text{red}) = \frac{2}{5}
- P(\text{green}) + P(\text{red}) = 1
The classical definition of probability assumes:
- A finite sample space \Omega with equally likely outcomes (‘fair’ experiment)
- For an event A, probability is defined as: P(A) = \frac{\text{favorable outcomes}}{\text{total outcomes}}
In our urn with 3 green and 2 red balls, these assumptions manifest as:
- Sample space \Omega = \{b_1, b_2, b_3, b_4, b_5\} where each ball is equally likely
- For green: P(\text{green}) = \frac{|\text{green balls}|}{|\Omega|} = \frac{3}{5}
- For red: P(\text{red}) = \frac{|\text{red balls}|}{|\Omega|} = \frac{2}{5}
Key probability axioms are demonstrated:
- Non-negativity: P(\text{green}), P(\text{red}) \geq 0
- Normalization: P(\Omega) = P(\text{green}) + P(\text{red}) = 1
- Additivity: Since green and red are disjoint events, P(\text{green or red}) = P(\text{green}) + P(\text{red})
REMARK: Many probabilistic situations have the property that they involve a number of different possible outcomes, all of which are equally likely. For example, Heads and Tails on a coin are equally likely to be tossed, the numbers 1 through 6 on a die are equally likely to be rolled, and the ten balls in the above box are all equally likely to be picked.
‘Naive’ (classical) probability definition assumes uniform probability measure (all outcomes equally likely), and finite uniform sample space.
When considering shapes or elements of the same color in an urn or box, treating them as distinguishable allows you to assume a uniform sample space — equally likely outcomes.
17.5 How to Calculate Basic Probabilities
Let’s explore some fundamental probability concepts using a simple example with colored balls in an urn/bag. This example will help us understand:
- How to calculate basic probabilities using the tree diagrams
- How replacement affects probability
- How the importance of order affects our calculations
- How to break down probability problems into steps
Tree diagrams are powerful tools for visualizing sequential events. Each branch represents a possible outcome, and probabilities multiply along paths.
When we count possibilities (sample space size = |S|) in probability problems, we need to think about two important questions:
- Does the order of our selections matter? (Like picking a phone PIN where 1234 is different from 4321)
- Can we reuse items we’ve already selected? (Like picking letters where we can reuse them, versus picking students where we can’t pick the same person twice)
Consider sampling from elements \{A, B, C\}. The key distinction is whether we care about:
- Sequences (ordered lists): where position matters
- Sets: where only membership matters
17.6 Sampling Scenarios
Sampling Method | With Replacement | Without Replacement |
---|---|---|
Sequences (Order Matters) | Ordered lists with repetition: (A,A), (A,B), (A,C) (B,A), (B,B), (B,C) (C,A), (C,B), (C,C) |
Ordered lists without repetition: (A,B), (A,C) (B,A), (B,C) (C,A), (C,B) |
Sets (Order Doesn’t Matter) | Multisets (sets with repetition): \{A,A\}, \{A,B\}, \{A,C\} \{B,B\}, \{B,C\} \{C,C\} |
Sets (no repetition): \{A,B\}, \{A,C\} \{B,C\} |
17.7 Mathematical Properties
- Sequences (Order Matters)
- Elements have positions: a_1, a_2, \ldots, a_n
- Two sequences \mathbf{x}, \mathbf{y} are equal iff x_i = y_i for all i
- Denoted as ordered tuples: (a_1, a_2, \ldots, a_n)
- Sets (Order Doesn’t Matter)
- Elements have no position, only membership matters
- Two sets are equal if they contain the same elements
- Denoted with curly braces: \{a_1, a_2, \ldots, a_n\}
17.8 Factorial Notation
For any non-negative integer n, the factorial of n (denoted as n!) is defined as:
n! = n \cdot (n-1) \cdot (n-2) \cdot ... \cdot 2 \cdot 1
Special cases:
- 0! = 1 (by definition)
- 1! = 1
- 2! = 2 \cdot 1 = 2
- 3! = 3 \cdot 2 \cdot 1 = 6
- 4! = 4 \cdot 3 \cdot 2 \cdot 1 = 24
This can be written recursively as:
- n! = n \cdot (n-1)! for n > 0
- 0! = 1
Here’s why 0! equals 1:
By definition, for any positive integer n, n! = n × (n-1)!
This means 1! = 1 × 0!
We know 1! = 1
Therefore: 1 = 1 × 0!
Solving for 0!: 0! = 1
This definition is also consistent with the combinatorial interpretation - there is exactly one way to arrange zero elements.
17.9 Number of Outcomes
For n distinct elements, selecting k items:
- Sequences with Replacement
- Each position has n choices
- Total: n^k outcomes
- Sequences without Replacement
- Permutations: P(n,k) = \frac{n!}{(n-k)!}
- Each next position has one fewer choice
- Sets with Replacement
- Combinations with repetition allowed
- Total: \binom{n+k-1}{k} outcomes
- Sets without Replacement
- Combinations: \binom{n}{k} = \frac{n!}{k!(n-k)!}
- Each subset of size k counted once
17.10 Key Relationships
For sequences vs sets without replacement:
P(n,k) = k! \cdot \binom{n}{k}
For any sampling scheme:
\text{sequences} \geq \text{sets} \text{with replacement} \geq \text{without replacement}
Example 1: Drawing Two Balls from an Urn/Bag
Consider drawing two balls from an urn containing 3 green and 2 red balls.
Find the probabilities of the following random events:
The first ball is red and the second one is green (order matters, drawing without replacement)
The first ball is red and the second one is green (order matters, drawing with replacement)
The balls are of different colors (order doesn’t matter, drawing without replacement)
The balls are of different colors (order doesn’t matter, drawing with replacement)
Understanding Event Types in Probability:
- Simple events represent a single outcome from a single random action, such as drawing one ball from an urn. The probability of a simple event is calculated directly from the number of favorable outcomes divided by the total possible outcomes.
- Compound events involve multiple outcomes or conditions that must occur together. These can occur simultaneously (like rolling two dice at once) or sequentially (like drawing two balls one after another). The key difference lies in whether the events happen at the same time or in sequence.
- Sequential events are a specific type of compound events where outcomes occur in a particular order over time. Our urn example is particularly instructive here because it demonstrates sequential events through the process of drawing balls one after another. This allows us to explore how the probability of the second draw depends on what happened in the first draw (when sampling without replacement).
To better understand how the sample space changes based on our sampling method, let’s examine two scenarios:
- With Replacement
When we sample with replacement, we return the ball to the urn after the first draw. This means:
- The probability remains constant for each draw
- Total possible outcomes: 25 (5×5 grid)
- Each outcome has equal probability
- P(\text{both green}) = \frac{3}{5} \times \frac{3}{5} = \frac{9}{25}
- P(\text{both red}) = \frac{2}{5} \times \frac{2}{5} = \frac{4}{25}
- P(\text{mixed}) = \frac{12}{25}
- Without Replacement
When we sample without replacement, the first draw affects the probability of the second draw:
- Total possible outcomes: 20 (removing diagonal cells where same ball is drawn twice)
- Second draw probabilities change based on first draw
- P(\text{both green}) = \frac{3}{5} \times \frac{2}{4} = \frac{6}{20}
- P(\text{both red}) = \frac{2}{5} \times \frac{1}{4} = \frac{2}{20}
- P(\text{mixed}) = \frac{12}{20}
The grid diagram above visualizes both scenarios, where:
- Green cells represent both balls drawn being green
- Red cells represent both balls drawn being red
- Orange cells represent mixed outcomes (one green, one red)
- Crossed-out cells in the “Without Replacement” grid show impossible outcomes
This visualization helps demonstrate how the sample space and probabilities change between the two sampling methods, while maintaining the fundamental principle that probabilities must sum to 1 in both cases.
- Drawing Two Balls Without Replacement
Consider drawing two balls from an urn containing 3 green and 2 red balls. Let’s analyze all scenarios systematically.
flowchart TD A(["Initial State\n3G, 2R"]) --> B["First: Green\n3/5"] A --> C["First: Red\n2/5"] B --> D["Second: Green\n2/4"] B --> E["Second: Red\n2/4"] C --> F["Second: Green\n3/4"] C --> G["Second: Red\n1/4"] D --> H["GG: 3/5 × 2/4 = 6/20"] E --> I["GR: 3/5 × 2/4 = 6/20"] F --> J["RG: 2/5 × 3/4 = 6/20"] G --> K["RR: 2/5 × 1/4 = 2/20"]
Let’s solve for different scenarios:
First red, then green (order matters):
P(R \text{ then } G) = \frac{2}{5} \cdot \frac{3}{4} = \frac{6}{20} = 0.3
Different colors (order doesn’t matter):
P(\text{different colors}) = P(R \text{ then } G) + P(G \text{ then } R)
= \frac{2}{5} \cdot \frac{3}{4} + \frac{3}{5} \cdot \frac{2}{4} = \frac{6}{20} + \frac{6}{20} = \frac{12}{20} = 0.6
- Drawing With Replacement
When we replace the first ball before drawing the second, the probabilities for the second draw remain unchanged:
flowchart TD A(["Initial State\n3G, 2R"]) --> B["First: Green\n3/5"] A --> C["First: Red\n2/5"] B --> D["Second: Green\n3/5"] B --> E["Second: Red\n2/5"] C --> F["Second: Green\n3/5"] C --> G["Second: Red\n2/5"] D --> H["GG: 3/5 × 3/5 = 9/25"] E --> I["GR: 3/5 × 2/5 = 6/25"] F --> J["RG: 2/5 × 3/5 = 6/25"] G --> K["RR: 2/5 × 2/5 = 4/25"]
Now:
First red, then green (order matters):
P(R \text{ then } G) = \frac{2}{5} \cdot \frac{3}{5} = \frac{6}{25} = 0.24
Different colors (order doesn’t matter):
P(\text{different colors}) = \frac{2}{5} \cdot \frac{3}{5} + \frac{3}{5} \cdot \frac{2}{5} = \frac{12}{25} = 0.48
Key observations:
- Without replacement:
- Different orders of the same colors have different probabilities
- The second draw’s probability depends on the first outcome
- With replacement:
- Each draw is independent
- Probabilities multiply directly because sample space remains unchanged
Example 2: The 4 Red and 3 Black Balls Problem
Let’s solve a real problem using what we learned. We have:
- 4 red balls (let’s call them R₁, R₂, R₃, R₄)
- 3 black balls (B₁, B₂, B₃)
- We’ll draw 2 balls without replacement
- Order doesn’t matter (like picking team members)
We want to find three probabilities:
- Getting two red balls
- Getting two black balls
- Getting one of each color
Method 1: Using Counting Rules
First, let’s count the total possible outcomes:
- We’re picking 2 balls from 7 total balls, order doesn’t matter
- Total outcomes = \binom{7}{2} = \frac{7!}{2!(7-2)!} = \frac{7 \times 6}{2 \times 1} = 21
Now let’s find each probability:
- Two Red Balls
- We need to pick 2 red balls from 4 red balls
- This is like picking 2 team members from 4 people
- Number of ways = \binom{4}{2} = \frac{4 \times 3}{2 \times 1} = 6
- Probability = \frac{6}{21}
- Two Black Balls
- Similarly, we need to pick 2 black balls from 3 black balls
- Number of ways = \binom{3}{2} = \frac{3 \times 2}{2 \times 1} = 3
- Probability = \frac{3}{21}
- One Red and One Black
- We need:
- One red ball (we have 4 to choose from)
- One black ball (we have 3 to choose from)
- Number of ways = 4 \times 3 = 12
- Probability = \frac{12}{21}
Let’s verify our work:
- All probabilities should add to 1
- \frac{6}{21} + \frac{3}{21} + \frac{12}{21} = \frac{21}{21} = 1 ✓
This matches what we expect - every time we draw two balls, we must get either:
- Two red balls
- Two black balls
- One of each color
Understanding how to count correctly helps us solve these probability problems systematically and avoid common mistakes like counting the same outcome multiple times.
Method 2: Tree Diagram Approach
The tree diagram helps us visualize the sequential nature of the draws:
graph TD A[Start] --> B[First: Red 4/7] A --> C[First: Black 3/7] B --> D[Second: Red 3/6] B --> E[Second: Black 3/6] C --> F[Second: Red 4/6] C --> G[Second: Black 2/6]
Using the tree diagram:
P(both red) = \frac{4}{7} \cdot \frac{3}{6} = \frac{12}{42} = \frac{6}{21}
P(red then black) = \frac{4}{7} \cdot \frac{3}{6} = \frac{12}{42}
P(multi-colored) = P(red then black) + P(black then red)
= \frac{4}{7} \cdot \frac{3}{6} + \frac{3}{7} \cdot \frac{4}{6} = \frac{24}{42}
Method 3: Grid Diagram Analysis of Two-Ball Draws
Let’s visualize the ordered sample space using a grid where rows represent the second draw and columns represent the first draw:
First Draw → | R₁ | R₂ | R₃ | R₄ | B₁ | B₂ | B₃ |
---|---|---|---|---|---|---|---|
Second Draw ↓ | |||||||
R₁ | X | ⚫ | ⚫ | ⚫ | ⚪ | ⚪ | ⚪ |
R₂ | ⚫ | X | ⚫ | ⚫ | ⚪ | ⚪ | ⚪ |
R₃ | ⚫ | ⚫ | X | ⚫ | ⚪ | ⚪ | ⚪ |
R₄ | ⚫ | ⚫ | ⚫ | X | ⚪ | ⚪ | ⚪ |
B₁ | ⚪ | ⚪ | ⚪ | ⚪ | X | ⚫ | ⚫ |
B₂ | ⚪ | ⚪ | ⚪ | ⚪ | ⚫ | X | ⚫ |
B₃ | ⚪ | ⚪ | ⚪ | ⚪ | ⚫ | ⚫ | X |
Where:
- X: Impossible (same ball drawn twice)
- ⚫: Both same color (both red in upper-left, both black in lower-right)
- ⚪: Different colors (red-black or black-red)
From this grid:
- Both red = 12 outcomes (⚫ in upper-left quadrant)
- Both black = 6 outcomes (⚫ in lower-right quadrant)
- Red then black = 12 outcomes (⚪ in lower-left quadrant)
- Black then red = 12 outcomes (⚪ in upper-right quadrant)
- Total possible outcomes = 42 (all cells minus 7 diagonal X’s)
Analysis of Two-Ball Draws
From the grid, we can count the following outcomes:
- Both red = 12 outcomes (⚫ in upper-left quadrant)
- Red then black = 12 outcomes (⚪ in lower-left quadrant)
- Black then red = 12 outcomes (⚪ in upper-right quadrant)
- Total possible outcomes = 42 (all cells minus 7 diagonal X’s)
Therefore:
- P(both red) = \frac{12}{42} = \frac{2}{7}
- P(red then black) = \frac{12}{42} = \frac{2}{7}
- P(multi-colored) = \frac{24}{42} = \frac{4}{7} (includes both red-then-black and black-then-red)
Comparing the Methods
Each method highlights different aspects of the problem:
Counting Rules:
- Most efficient for calculation
- Helps understand combinations and arrangements
- May obscure the actual outcomes
Tree Diagram:
- Shows sequential nature of draws
- Makes conditional probability clear
- Visualizes how probabilities combine
- Good for checking intuition
Grid Diagram:
- Shows entire sample space explicitly
- Makes it clear why diagonal is impossible
- Helps visualize groups of outcomes
- Demonstrates why we divide by total possibilities
- Shows symmetry in the problem
17.11 Problem Solutions (1)
Problem 1: Two-Ball Drawing from an Urn
An urn contains 3 red, 2 blue, and 1 yellow balls. Two balls are drawn sequentially without replacement. We need to find the probability that the balls drawn are different colors.
Initial Conditions
Let’s first state our starting conditions:
- Total number of balls: n = 3 + 2 + 1 = 6
- Distribution of balls:
- Red: n_R = 3
- Blue: n_B = 2
- Yellow: n_Y = 1
Visual Representation
Let’s visualize all possible outcomes using a tree diagram:
graph TD A[Start] --> B["R (3/6)"] A --> C["B (2/6)"] A --> D["Y (1/6)"] B --> E["B (2/5)"] B --> F["Y (1/5)"] B --> G["R (2/5)"] C --> H["R (3/5)"] C --> I["Y (1/5)"] C --> J["B (1/5)"] D --> K["R (3/5)"] D --> L["B (2/5)"] D --> M["Y (0/5)"] E --> N["RB (Success)"] F --> O["RY (Success)"] G --> P["RR (Fail)"] H --> Q["BR (Success)"] I --> R["BY (Success)"] J --> S["BB (Fail)"] K --> T["YR (Success)"] L --> U["YB (Success)"] M --> V["YY (Fail)"]
Probability Calculation
Let’s calculate the probability of drawing different colors systematically:
- Starting with Red (probability \frac{3}{6}):
- Red → Blue: P(R,B) = \frac{3}{6} \cdot \frac{2}{5} = \frac{6}{30}
- Red → Yellow: P(R,Y) = \frac{3}{6} \cdot \frac{1}{5} = \frac{3}{30}
- Starting with Blue (probability \frac{2}{6}):
- Blue → Red: P(B,R) = \frac{2}{6} \cdot \frac{3}{5} = \frac{6}{30}
- Blue → Yellow: P(B,Y) = \frac{2}{6} \cdot \frac{1}{5} = \frac{2}{30}
- Starting with Yellow (probability \frac{1}{6}):
- Yellow → Red: P(Y,R) = \frac{1}{6} \cdot \frac{3}{5} = \frac{3}{30}
- Yellow → Blue: P(Y,B) = \frac{1}{6} \cdot \frac{2}{5} = \frac{2}{30}
Final Solution
The total probability of drawing two different colored balls is the sum of all favorable outcomes:
\begin{align*} P(\text{different colors}) &= P(R,B) + P(R,Y) + P(B,R) + P(B,Y) + P(Y,R) + P(Y,B) \\ &= \frac{6}{30} + \frac{3}{30} + \frac{6}{30} + \frac{2}{30} + \frac{3}{30} + \frac{2}{30} \\ &= \frac{22}{30} \\ &= \frac{11}{15} \\ &\approx 0.733 \text{ or } 73.3\% \end{align*}
Verification
This result aligns with our intuition because:
- The sample space contains more ways to draw different colors than same colors
- The complementary probability (drawing same colors) would be \frac{4}{15} or about 26.7%
- Since same-color draws are limited to RR, BB, and YY combinations, it makes sense that different-color draws are more likely
Problem 2: Die and Coin Probability Exercise
Let’s analyze the probability of getting heads OR tails OR three dots when flipping both a coin and a die. This problem offers an excellent opportunity to explore probability unions and the importance of careful counting.
Understanding the Problem Space
In our experiment:
- We flip a coin (possible outcomes: heads, tails)
- We roll a die (possible outcomes: 1, 2, 3, 4, 5, 6 dots)
- These events occur simultaneously
Let’s start with a visualization:
graph TD A[Experiment] --> B[Coin] A --> C[Die] B --> D[Heads] B --> E[Tails] C --> F[1 dot] C --> G[2 dots] C --> H[3 dots] C --> I[4 dots] C --> J[5 dots] C --> K[6 dots]
Common Mistakes and Overcounting Analysis
A common first instinct might be to simply add the individual probabilities:
P(heads) + P(tails) + P(three dots) = \frac{1}{2} + \frac{1}{2} + \frac{1}{6} = \frac{7}{6}
This incorrect approach reveals several important issues:
- The result exceeds 1, which is impossible for a probability
- We’ve counted many outcomes multiple times
- We’ve failed to recognize event overlaps
Let’s analyze the overcounting:
graph TD A[Overcounting Analysis] --> B[Heads counted: 6/12] A --> C[Tails counted: 6/12] A --> D[Three dots counted: 2/12] B --> E[Including three with heads: 1/12] C --> F[Including three with tails: 1/12] E --> G[Double counted!] F --> G
Correct Solution Using Set Theory
Let’s solve this properly using set theory:
- Set H: All outcomes with heads
- Set T: All outcomes with tails
- Set 3: All outcomes with three dots
Key insights:
- Sets H and T are mutually exclusive
- Set 3 is entirely contained within H ∪ T
- Therefore, P(H ∪ T ∪ 3) = P(H ∪ T) = 1
We can write this formally:
P(H ∪ T ∪ 3) = P(H) + P(T) - P(H ∩ T) + P(3) - P(3 ∩ (H ∪ T)) = \frac{1}{2} + \frac{1}{2} - 0 + \frac{1}{6} - \frac{1}{6} = 1
Sample Space Analysis
graph TD A[Total Outcomes: 12] --> B[Heads: 6] A --> C[Tails: 6] B --> D[With three: 1] C --> E[With three: 1] D --> F[Already counted in heads] E --> G[Already counted in tails]
This visual representation helps us understand why:
- The sample space has 12 total outcomes (2 × 6)
- The three-dot outcomes are already included in heads and tails counts
- Adding P(three dots) would lead to double counting
Key Learning Points
This problem illustrates several fundamental probability concepts:
Exhaustive Events: Heads and tails together cover all possible coin outcomes, making additional events redundant unless they introduce new dimensions.
Double Counting Protection: The inclusion-exclusion principle helps us avoid counting outcomes multiple times.
Sample Space Structure: Understanding your sample space structure (12 total outcomes) helps verify solution logic.
Problem 3: Laplace’s Two-Draw Probability Problem
Suppose there are two urns of coloured marbles:
- Urn X contains 3 black marbles, 1 white.
- Urn Y contains 1 black marble, 3 white.
I flip a fair coin to decide which urn to draw from, heads for Urn X and tails for Urn Y. Then I draw marbles at random.
Laplace asked what happens if we do two draws, with replacement. What’s the probability both draws will come up black?
Let’s solve this fascinating probability problem involving two draws with replacement. This is a particularly interesting case because the replacement aspect affects how we think about sequential probabilities.
Understanding the Initial Setup
First, let’s clarify our starting conditions:
Urn X (selected with heads):
- 3 black marbles, 1 white marble
- Total: 4 marbles
- P(black|X) = \frac{3}{4}
Urn Y (selected with tails):
- 1 black marble, 3 white marbles
- Total: 4 marbles
- P(black|Y) = \frac{1}{4}
Let’s visualize this with a tree diagram showing all possible paths:
graph TD A[Start] --> B[Urn X 1/2] A --> C[Urn Y 1/2] B --> D[Draw 1 Black 3/4] B --> E[Draw 1 White 1/4] C --> F[Draw 1 Black 1/4] C --> G[Draw 1 White 3/4] D --> H[Draw 2 Black 3/4] D --> I[Draw 2 White 1/4] E --> J[Draw 2 Black 3/4] E --> K[Draw 2 White 1/4] F --> L[Draw 2 Black 1/4] F --> M[Draw 2 White 3/4] G --> N[Draw 2 Black 1/4] G --> O[Draw 2 White 3/4]
Step-by-Step Solution
Let’s break this down into manageable steps:
- First, consider the urn selection:
- P(Urn X) = P(heads) = \frac{1}{2}
- P(Urn Y) = P(tails) = \frac{1}{2}
- For two black draws from Urn X:
- P(black and black|X) = \frac{3}{4} \times \frac{3}{4} = \frac{9}{16}
- P(X and both black) = \frac{1}{2} \times \frac{9}{16} = \frac{9}{32}
- For two black draws from Urn Y:
- P(black and black|Y) = \frac{1}{4} \times \frac{1}{4} = \frac{1}{16}
- P(Y and both black) = \frac{1}{2} \times \frac{1}{16} = \frac{1}{32}
- Total probability (using the law of total probability): P(both black) = P(X and both black) + P(Y and both black) = \frac{9}{32} + \frac{1}{32} = \frac{10}{32} = \frac{5}{16} ≈ 0.3125 or about 31.25%
Key Insights from This Problem
- Replacement Matters:
- Because we replace after the first draw, the probabilities remain constant for the second draw
- This is different from drawing without replacement, where probabilities would change
- Conditional Independence:
- Once we know which urn we’re using, the draws are independent
- However, the draws are not unconditionally independent
- Law of Total Probability:
- We needed to consider both paths (Urn X and Urn Y) to find the total probability
- Each path’s contribution is weighted by the probability of selecting that urn
17.12 Core Probability Rules
The Complement Rule
The complement rule is one of the most fundamental concepts in probability theory. For any event A, there’s always the possibility that A doesn’t occur. We call this the complement of A, written as A' or A^c.
The complement rule states:
P(A') = 1 - P(A)
This makes intuitive sense because any outcome must either be in A or in A’ (but not both), and something must happen (the total probability must be 1).
Real-World Example: Consider a weather forecast that predicts a 70% chance of rain tomorrow. Using the complement rule, we can immediately calculate that there’s a 30% chance it won’t rain:
P(\text{no rain}) = 1 - P(\text{rain}) = 1 - 0.70 = 0.30
Another Example: In a game of roulette, what’s the probability of not landing on red? There are 18 red numbers, 18 black numbers, and 2 green numbers (0 and 00) on a roulette wheel. Therefore:
P(\text{red}) = \frac{18}{38} P(\text{not red}) = 1 - \frac{18}{38} = \frac{20}{38}
The Addition/Sum Rule
When we want to find the probability of either one event OR another occurring, we use the addition rule. However, we need to be careful about double-counting outcomes that are in both events.
For any two events A and B:
P(A \cup B) = P(A) + P(B) - P(A \cap B)
The term P(A \cap B) represents the probability of both events occurring simultaneously. We subtract it to avoid counting these outcomes twice.
Real-World Example: In a college class, 65% of students play sports, 45% are in clubs, and 25% do both. What percentage of students are involved in either sports or clubs?
P(\text{sports or clubs}) = 65\% + 45\% - 25\% = 85\%
For mutually exclusive events (events that cannot occur simultaneously), P(A \cap B) = 0, so the formula simplifies to:
P(A \cup B) = P(A) + P(B)
Example: When rolling a die, what’s the probability of rolling either a 1 or a 6? Since these outcomes can’t happen simultaneously:
P(1 \text{ or } 6) = P(1) + P(6) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}
Conditional Probability, the Multiplication Rule, and Bayes’ Theorem
Understanding Conditional Probability
Conditional probability represents how the probability of one event changes when we have information about another event. Let’s start with a simple example:
Imagine you have a deck of 52 playing cards. What’s the probability of drawing a King given that you’ve drawn a face card (Jack, Queen, or King)?
To solve this:
- Total face cards = 12 (4 each of Jack, Queen, King)
- Number of Kings = 4
- P(\text{King}|\text{Face Card}) = \frac{P(\text{King} \cap \text{Face Card})}{P(\text{Face Card})} = \frac{4/52}{12/52} = \frac{1}{3}
This illustrates the fundamental formula for conditional probability:
P(A|B) = \frac{P(A \cap B)}{P(B)}
In the context of classical probability, where all outcomes are equally likely, we can express these probabilities in terms of the number of favorable outcomes:
P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{|A \cap B|/|S|}{|B|/|S|} = \frac{|A \cap B|}{|B|}
where:
- |A \cap B| represents the number of outcomes in both events A and B
- |B| represents the number of outcomes in event B
- |S| represents the total number of outcomes in the sample space
This is why in our card example: P(\text{King}|\text{Face Card}) = \frac{|\text{Kings}|}{|\text{Face Cards}|} = \frac{4}{12} = \frac{1}{3}
The formula shows that in classical probability, conditional probability is simply the ratio of the number of outcomes favorable to both events to the number of outcomes in the conditioning event.
Medical Testing Example
Let’s explore a more practical example involving medical testing. Consider a disease that affects 1% of the population and a test T with the following characteristics:
- Sensitivity (true positive rate): P(T^+|D^+) = 0.95
- Specificity (true negative rate): P(T^-|D^-) = 0.98
We can organize this information in a cross table for a population of 10,000:
Test/Disease | D^+ Present | D^- Absent | Total |
---|---|---|---|
T^+ | 95 | 198 | 293 |
T^- | 5 | 9,702 | 9,707 |
Total | 100 | 9,900 | 10,000 |
Using this, we can calculate important probabilities like:
- Positive Predictive Value: P(D^+|T^+) = \frac{95}{293} \approx 0.32
- Negative Predictive Value: P(D^-|T^-) = \frac{9,702}{9,707} \approx 0.999
Note that:
- P(D^+) = 0.01 (disease prevalence)
- P(T^+|D^-) = 0.02 (false positive rate = 1 - specificity)
- P(T^-|D^+) = 0.05 (false negative rate = 1 - sensitivity)
These probabilities show how a test with seemingly good characteristics (95% sensitivity and 98% specificity) can still lead to many false positives when the condition being tested for is rare in the population.
Consider an urn containing:
- 3 blue marbles (B)
- 2 marked with star (S^+)
- 1 unmarked (S^-)
- 2 red marbles (R)
- 1 marked with star (S^+)
- 1 unmarked (S^-)
Let’s calculate the probability of drawing a starred marble given that we drew a blue marble.
Using the tree diagram:
graph LR Ω{Ω} --> B[B: 3/5] Ω --> R[R: 2/5] B --> BS+["S⁺|B: 2/3"] B --> BS-["S⁻|B: 1/3"] R --> RS+["S⁺|R: 1/2"] R --> RS-["S⁻|R: 1/2"] style Ω fill:#e6e6ff,stroke:#333,stroke-width:2px,color:#000 style B fill:#ccf2ff,stroke:#333,stroke-width:2px,color:#000 style R fill:#ffe6e6,stroke:#333,stroke-width:2px,color:#000 style BS+ fill:#f2f2f2,stroke:#333,stroke-width:1px,color:#000 style BS- fill:#f2f2f2,stroke:#333,stroke-width:1px,color:#000 style RS+ fill:#f2f2f2,stroke:#333,stroke-width:1px,color:#000 style RS- fill:#f2f2f2,stroke:#333,stroke-width:1px,color:#000 linkStyle default stroke:#333,stroke-width:1px
From this tree, we can calculate:
- P(B) = \frac{3}{5}
- P(S^+|B) = \frac{2}{3} (probability of star given blue)
- P(B \cap S^+) = \frac{3}{5} \cdot \frac{2}{3} = \frac{2}{5}
We can verify the conditional probability formula:
P(S^+|B) = \frac{P(B \cap S^+)}{P(B)} = \frac{2/5}{3/5} = \frac{2}{3}
Other probabilities from this scenario:
- P(R) = \frac{2}{5}
- P(S^+|R) = \frac{1}{2}
- P(S^+) = P(B)P(S^+|B) + P(R)P(S^+|R) = \frac{3}{5} \cdot \frac{2}{3} + \frac{2}{5} \cdot \frac{1}{2} = \frac{3}{5}
This example illustrates how:
- The probability tree helps visualize sequential events
- Branch probabilities multiply along paths
- The conditional probability formula naturally emerges from the tree structure
- The law of total probability can be visualized as summing across different paths
Consider drawing two balls from an urn containing 3 blue (B) and 2 red (R) balls without replacement. Let’s find the probability of drawing two blue balls.
graph LR Ω{Start} --> B1["B₁: 3/5"] Ω --> R1["R₁: 2/5"] B1 --> B2["B₂|B₁: 2/4"] B1 --> R2["R₂|B₁: 2/4"] R1 --> B3["B₂|R₁: 3/4"] R1 --> R3["R₂|R₁: 1/4"] style Ω fill:#e6e6ff,stroke:#333,stroke-width:2px,color:#000 style B1 fill:#ccf2ff,stroke:#333,stroke-width:2px,color:#000 style R1 fill:#ffe6e6,stroke:#333,stroke-width:2px,color:#000 style B2 fill:#ccf2ff,stroke:#333,stroke-width:1px,color:#000 style R2 fill:#ffe6e6,stroke:#333,stroke-width:1px,color:#000 style B3 fill:#ccf2ff,stroke:#333,stroke-width:1px,color:#000 style R3 fill:#ffe6e6,stroke:#333,stroke-width:1px,color:#000 linkStyle default stroke:#333,stroke-width:1px
Let’s calculate various probabilities:
- Two blue balls (B₁ and B₂):
- P(B_1) = \frac{3}{5} (first draw)
- P(B_2|B_1) = \frac{2}{4} (second draw given first was blue)
- P(B_1 \cap B_2) = \frac{3}{5} \cdot \frac{2}{4} = \frac{3}{10}
- Blue then Red:
- P(R_2|B_1) = \frac{2}{4}
- P(B_1 \cap R_2) = \frac{3}{5} \cdot \frac{2}{4} = \frac{3}{10}
- Red then Blue:
- P(R_1) = \frac{2}{5}
- P(B_2|R_1) = \frac{3}{4}
- P(R_1 \cap B_2) = \frac{2}{5} \cdot \frac{3}{4} = \frac{3}{10}
- Two red balls:
- P(R_2|R_1) = \frac{1}{4}
- P(R_1 \cap R_2) = \frac{2}{5} \cdot \frac{1}{4} = \frac{1}{10}
Key observations:
- The probability of second draw depends on first outcome (conditional probability)
- Total probability = 1: \frac{3}{10} + \frac{3}{10} + \frac{3}{10} + \frac{1}{10} = 1
- Notice that P(B_1 \cap R_2) = P(R_1 \cap B_2) due to symmetry
- The denominator changes after first draw (4 balls remain)
This example illustrates how:
- Probabilities update based on previous outcomes
- The multiplication rule applies to sequential events
- Sample space reduces after each draw
- Order can matter in sequential probability calculations
Conditional probability answers the question: “Given that we know event B has occurred, what is the probability that event A will occur?” We write this as P(A|B), read as “the probability of A given B.”
The formal definition is:
P(A|B) = \frac{P(A \cap B)}{P(B)}
Geometrically, we can visualize this as:
- The original sample space \Omega represented as a rectangle
- Event B as a region within \Omega
- The intersection A \cap B as the overlap between regions A and B
- Conditional probability as the ratio of the overlap area to the area of B
This visualization helps understand why we divide by P(B) - we’re essentially creating a new probability space where B is our universe.
Imagine these two questions:
- What’s the probability it’s raining (A) given there are clouds (B)?
- What’s the probability there are clouds (B) given it’s raining (A)?
Clearly, these are different:
- P(A|B): Among all cloudy days, how many are rainy?
- P(B|A): Among all rainy days, how many are cloudy?
P(B|A) would be close to 1 (almost all rainy days have clouds) While P(A|B) might be around 0.3 (not all cloudy days bring rain)
When are they equal?
- When events are independent:
- P(A|B) = P(A) and P(B|A) = P(B)
- When events have symmetric relationship:
- Drawing cards: P(\text{red}|\text{face}) = P(\text{face}|\text{red})
- Both equal \frac{6}{26} = \frac{3}{13}
- When applying Bayes: if P(A|B) = P(B|A), then P(A) = P(B)
- From P(A|B) = \frac{P(B|A)P(A)}{P(B)}
- If P(A|B) = P(B|A), then P(A) = P(B)
The Law of Total Probability
Imagine a sample space \Omega as a complete population where every individual must be classified into exactly one category. This is what a partition does - it divides our universe of possibilities into distinct, non-overlapping groups that together include all possibilities.
Consider how we might partition a population:
- Let A_1 = “Category 1”
- Let A_2 = “Category 2”
- Let A_3 = “Category 3”
These classifications form a partition because:
- Complete Coverage (\Omega = A_1 \cup A_2 \cup ... \cup A_n):
- Every outcome in the sample space must belong to exactly one category
- Nothing can be left unclassified
- The categories together capture all possibilities
- Mutual Exclusivity (A_i \cap A_j = \emptyset for i \neq j):
- Each outcome belongs to exactly one category
- Categories cannot overlap
- Being in one category excludes being in any other
This framework becomes powerful when:
- Calculating total probability (sum across all categories)
- Updating beliefs with new information
- Breaking complex problems into manageable pieces
Think of it like organizing a filing system: each document must go into exactly one folder (mutual exclusivity), and every document must be filed somewhere (complete coverage). When we get new information, we might need to update our filing system, but we always maintain these two key properties.
The power of partitioning lies in its ability to help us systematically organize possibilities and update probabilities as new information becomes available. This forms the foundation for understanding more complex concepts like the law of total probability and Bayes’ theorem.
The law of total probability is a fundamental bridge between conditional probabilities and overall probabilities. Given a partition \{A_1, A_2, ..., A_n\} of the sample space, for any event B:
P(B) = \sum_{i=1}^n P(B|A_i)P(A_i)
Visually, this represents:
- Breaking the sample space into disjoint “slices” (the partition)
- Finding the probability of B within each slice (P(B|A_i))
- Weighting each slice by its probability (P(A_i))
- Summing all contributions
Example: In a tech company:
- 40% of employees are developers (A_1)
- 35% are managers (A_2)
- 25% are other roles (A_3)
To find the probability of an employee working remotely (B):
- 80% of developers work remotely: P(B|A_1) = 0.80
- 60% of managers work remotely: P(B|A_2) = 0.60
- 40% of other roles work remotely: P(B|A_3) = 0.40
Using the law of total probability:
P(B) = (0.80)(0.40) + (0.60)(0.35) + (0.40)(0.25) = 0.64
So 64% of all employees work remotely.
The Multiplication Rule
The conditional probability formula can be rearranged to give us the multiplication rule:
P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)
This symmetry is crucial because it shows:
- We can compute joint probabilities in two ways
- The order of conditioning doesn’t matter
- Both perspectives must yield the same result
Bayes’ Theorem: From Prior to Posterior Beliefs
Let’s derive Bayes’ theorem starting from the fundamental definitions of conditional probability and using the multiplication rule.
Starting Point: Conditional Probability
The conditional probability formula for two events A and B is:
P(A|B) = \frac{P(A \cap B)}{P(B)}
Similarly, we can write:
P(B|A) = \frac{P(A \cap B)}{P(A)}
Multiplication Rule
From either of these formulas, we can derive the multiplication rule:
P(A \cap B) = P(B|A) \cdot P(A) or equivalently P(A \cap B) = P(A|B) \cdot P(B)
Deriving Bayes’ Theorem
Start with the conditional probability formula:
P(A|B) = \frac{P(A \cap B)}{P(B)}
Use the multiplication rule to express the intersection:
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
This gives us Bayes’ theorem in its basic form. The denominator P(B) can be expanded using the law of total probability:
P(B) = P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)
Leading to the full form:
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)}
This derivation shows how Bayes’ theorem emerges naturally from the basic rules of probability, allowing us to “reverse” conditional probabilities and update prior beliefs with new evidence.
Using the law of total probability for the denominator:
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)}
The formula has three key components:
- Prior probability: P(A) - initial belief about event A
- Likelihood: P(B|A) - probability of evidence B given A is true
- Normalizing constant: P(B) - ensures probabilities sum to 1
Example 1: Playing Cards
Let’s start with a simple example using cards:
Suppose we draw a card and are told it’s red. What’s the probability it’s a face card?
Given:
- P(\text{Face}) = 12/52 = 3/13 (prior)
- P(\text{Red}|\text{Face}) = 6/12 = 1/2 (likelihood)
- P(\text{Red}) = 26/52 = 1/2 (normalizing constant)
Using Bayes’ theorem: P(\text{Face}|\text{Red}) = \frac{(1/2)(3/13)}{1/2} = \frac{3}{13}
Example 2: Medical Testing
A more practical application involves medical diagnostics. Let’s formalize the terminology:
Testing Framework:
- Conditions:
- D^+: Disease present
- D^-: Disease absent
- Test Results:
- T^+: Positive test
- T^-: Negative test
Key Metrics:
- Sensitivity: P(T^+|D^+) - True Positive Rate
- Specificity: P(T^-|D^-) - True Negative Rate
- PPV: P(D^+|T^+) - Positive Predictive Value
- NPV: P(D^-|T^-) - Negative Predictive Value
These relationships can be visualized in a confusion matrix:
D^+ | D^- | |
---|---|---|
T^+ | TP | FP |
T^- | FN | TN |
where:
- TP: True Positives
- TN: True Negatives
- FP: False Positives (Type I error)
- FN: False Negatives (Type II error)
Example Calculation: Consider a test for a rare disease where:
- Prevalence: P(D^+) = 0.01 (1%)
- Sensitivity: P(T^+|D^+) = 0.95 (95%)
- Specificity: P(T^-|D^-) = 0.98 (98%)
What’s the probability of having the disease given a positive test?
Using Bayes’ theorem:
P(D^+|T^+) = \frac{P(T^+|D^+) \cdot P(D^+)}{P(T^+|D^+) \cdot P(D^+) + P(T^+|D^-) \cdot P(D^-)}
P(D^+|T^+) = \frac{(0.95)(0.01)}{(0.95)(0.01) + (0.02)(0.99)} \approx 0.32
This counterintuitive result (only 32% chance of disease despite a positive test) illustrates the base rate fallacy - when the condition is rare, even a highly accurate test can have a low positive predictive value.
This example shows why Bayes’ theorem is crucial in medical decision-making, as it properly accounts for both the test’s accuracy and the disease’s prevalence in the population.
Spam Filtering Example
Email spam filters are a perfect real-world application of Bayes’ theorem. Let’s see how it works:
Consider a single word “lottery” in an email. We want to know: given that an email contains this word, what’s the probability it’s spam?
Let’s define our events:
- S: Email is spam
- W: Email contains the word “lottery”
We need:
- Prior probability: P(S) = 0.30 (30% of all emails are spam)
- Likelihood: P(W|S) = 0.20 (20% of spam emails contain “lottery”)
- False positive rate: P(W|S') = 0.001 (0.1% of legitimate emails contain “lottery”)
Using Bayes’ theorem:
P(S|W) = \frac{(0.20)(0.30)}{(0.20)(0.30) + (0.001)(0.70)} \approx 0.989
So if an email contains “lottery,” there’s a 98.9% chance it’s spam!
Real spam filters:
- Look at multiple words and features
- Update probabilities continuously based on user feedback
- Combine evidence using the multiplication rule for independent events
- Use logarithms to avoid numerical underflow with many multiplications
Weather Forecasting Example
Another practical application is weather forecasting. Suppose we want to know if it will rain tomorrow given certain atmospheric conditions:
Let’s define:
- R: It rains tomorrow
- C: Current atmospheric conditions (high pressure system)
Given:
- P(R) = 0.25 (25% chance of rain on any day)
- P(C|R) = 0.10 (10% of rainy days have high pressure)
- P(C|R') = 0.70 (70% of non-rainy days have high pressure)
Using Bayes’ theorem:
P(R|C) = \frac{(0.10)(0.25)}{(0.10)(0.25) + (0.70)(0.75)} \approx 0.045
So given high pressure, there’s only about a 4.5% chance of rain tomorrow.
Key Interconnections and Applications
These concepts form a unified framework with wide-ranging applications:
- Conditional probability provides the foundation for understanding dependent events
- The multiplication rule enables complex probability calculations
- Total probability helps break down complex scenarios into manageable pieces
- Bayes’ theorem combines these tools to update probabilities with new evidence
Modern Applications:
- Machine Learning: Naive Bayes classifiers for text categorization
- Medical Diagnosis: Interpreting test results and screening procedures
- Quality Control: Identifying defective products based on test results
- Risk Assessment: Updating risk probabilities with new information
- Natural Language Processing: Sentiment analysis and language modeling
- Forensics: Evaluating evidence in legal cases
- Recommender Systems: Predicting user preferences
Understanding these relationships helps in:
- Choosing the right probabilistic tool for a given problem
- Breaking complex problems into manageable pieces
- Avoiding common probability misconceptions
- Making better decisions under uncertainty
- Building intuition for machine learning algorithms
Independent and Disjoint Events
Understanding the difference between independent and disjoint events is crucial for correctly applying probability rules. The key insight is that these concepts are fundamentally different - in fact, disjoint events are always dependent (except in trivial cases).
Independent Events
Events A and B are independent if knowing that one occurred doesn’t affect the probability of the other occurring. Mathematically, this means any of these equivalent conditions:
- P(A|B) = P(A)
- P(B|A) = P(B)
- P(A \cap B) = P(A) \cdot P(B)
Example 1: Flipping a fair coin twice
- Let A = “heads on first flip” and B = “heads on second flip”
- P(A) = \frac{1}{2} and P(B) = \frac{1}{2}
- P(A \cap B) = \frac{1}{4} = P(A) \cdot P(B)
- Therefore, the flips are independent
Disjoint (Mutually Exclusive) Events
Events A and B are disjoint if they cannot occur simultaneously:
P(A \cap B) = 0
Example 2: Rolling a die
- Let A = “rolling a 6” and B = “rolling an odd number”
- P(A \cap B) = 0 (can’t be both 6 and odd)
- These events are disjoint
Why Disjoint Events are Always Dependent
Let’s prove that disjoint events (with non-zero probabilities) must be dependent:
For disjoint events: P(A \cap B) = 0
For events to be independent, we need: P(A \cap B) = P(A) \cdot P(B)
Therefore, for disjoint events to be independent:
- 0 = P(A) \cdot P(B) (from 1 and 2)
- This equation is only true if either P(A) = 0 or P(B) = 0
- But if either probability is 0, the event is impossible and trivial
For any non-trivial disjoint events:
- Assume P(A) > 0 and P(B) > 0 (considering non-trivial cases)
- Then P(B|A) = \frac{P(A \cap B)}{P(A)} = \frac{0}{P(A)} = 0
- Since P(B) > 0 (by our assumption of non-trivial cases)
- We have P(B|A) = 0 \neq P(B)
- This inequality proves the events are dependent
The assumption P(B) \neq 0 is crucial because:
- If we allowed P(B) = 0, then P(B|A) = P(B) would be true (both equal to 0)
- This would mean the events are technically independent
- But this is a trivial case where B is an impossible event
Example 3: Rolling a die illustrates dependence
- Let A = “rolling a 1” and B = “rolling a 2”
- P(A) = \frac{1}{6} and P(B) = \frac{1}{6} (both non-zero)
- P(B|A) = 0 (if we rolled a 1, we definitely didn’t roll a 2)
- But P(B) = \frac{1}{6}
- Therefore P(B|A) \neq P(B), showing dependence
Key Insights
- Independence means events don’t affect each other’s probabilities
- Disjoint means events can’t occur together
- These concepts are almost opposites:
- Independent events can occur together
- Disjoint events must affect each other’s probabilities
- The only case where events can be both independent and disjoint is when at least one event has probability 0 (impossible event)
This understanding is crucial for correctly applying probability rules and avoiding common misconceptions in probability calculations.
17.13 Problem Solutions (2)
Problem 1: Colored Balls - At Least One Red
Question: A bag contains 5 red and 3 blue marbles. Two marbles are drawn simultaneously from the bag. What is the probability that at least one marble is red?
Detailed Solution:
Approach 1: Using Tree Diagram and Complement Rule
The key idea here is that finding the probability of “at least one red” directly can be complex, but finding its complement - “no red balls” (all blue) - is simpler.
Then we can use P(\text{at least one red}) = 1 - P(\text{no red}).
Let’s visualize this with a diagram:
graph TD A[Start] --> B[First Draw] B --> C["Blue (3/8)"] B --> D["Red (5/8)"] C --> E["Blue (2/7)"] C --> F["Red (5/7)"] D --> G["Blue (3/7)"] D --> H["Red (4/7)"] E --> I["P = 3/8 * 2/7<br/>All Blue"] F --> J["P = 3/8 * 5/7<br/>At least one Red"] G --> K["P = 5/8 * 3/7<br/>At least one Red"] H --> L["P = 5/8 * 4/7<br/>At least one Red"] style I fill:#f9f9f9,stroke:#333 style J fill:#f9f9f9,stroke:#333 style K fill:#f9f9f9,stroke:#333 style L fill:#f9f9f9,stroke:#333
Now let’s solve using the complement rule:
P(\text{at least one red}) = 1 - P(\text{no red})
To get no red marbles, we need to draw both blue marbles:
- Total marbles: 8
- Blue marbles: 3
- P(\text{first blue}) = \frac{3}{8}
- P(\text{second blue}|\text{first blue}) = \frac{2}{7}
P(\text{no red}) = P(\text{both blue}) = \frac{3}{8} \times \frac{2}{7} = \frac{6}{56} = \frac{3}{28}
Therefore:
P(\text{at least one red}) = 1 - \frac{3}{28} = \frac{25}{28} \approx 0.893 or about 89.3%
Approach 2: Using Combinations
The combinations approach involves finding all possible ways to select 2 marbles out of 8, then subtracting the unfavorable outcomes (selecting 2 blue marbles).
Let’s understand combinations first:
- A combination represents the number of ways to select r items from n items where order doesn’t matter
- Notation: C(n,r) or \binom{n}{r}
- Formula: C(n,r) = \frac{n!}{r!(n-r)!}
For this problem:
- Total possible outcomes = C(8,2) = 28 ways to select 2 marbles from 8
- Unfavorable outcomes = C(3,2) = 3 ways to select 2 blue marbles from 3 blue marbles
- Favorable outcomes = C(8,2) - C(3,2) = 28 - 3 = 25
Therefore:
P(\text{at least one red}) = \frac{25}{28} \approx 0.893 or about 89.3%
Both methods give us the same result! The combinations approach is often more elegant for problems involving simultaneous selection, while the tree diagram approach helps visualize the problem better and is particularly useful when events happen in sequence.
Problem 2: Probability of Drawing Diamonds or Tens
From a standard deck of 52 cards, find the probability of drawing either a diamond or a ten.
Setup
Let’s define our events:
- Let D = “drawing a diamond”
- Let T = “drawing a ten”
We need to find P(D \cup T)
Solution Using the Addition Rule
Individual Probabilities
For diamonds:
- Number of diamonds = 13
- P(D) = \frac{13}{52} = \frac{1}{4}
For tens:
- Number of tens = 4
- P(T) = \frac{4}{52} = \frac{1}{13}
Intersection
- The ten of diamonds is counted in both events
- P(D \cap T) = \frac{1}{52}
Addition Rule
P(D \cup T) = P(D) + P(T) - P(D \cap T)
= \frac{13}{52} + \frac{4}{52} - \frac{1}{52}
= \frac{16}{52} - \frac{1}{52}
= \frac{15}{52}
\approx 0.288 or about 28.8%
Verification
We can verify this result is reasonable because:
- Upper Bound Check
- If we simply added P(D) and P(T): \frac{13}{52} + \frac{4}{52} = \frac{17}{52}
- Our answer must be less than this due to double counting
- Lower Bound Check
- Our answer must be greater than the larger individual probability (\frac{13}{52})
- \frac{15}{52} > \frac{13}{52} ✓
Teaching Notes
This problem illustrates several important concepts:
- Addition Rule Application
- Why we can’t simply add probabilities
- The role of intersection in avoiding double counting
- Fraction Arithmetic
- Working with common denominators
- Simplifying fractions (if desired)
- Set Theory Visualization
- The problem can be illustrated with a Venn diagram
- Shows why subtraction of intersection is necessary
- Reasonableness Checks
- Using bounds to verify answers
- Understanding why certain values are impossible
Problem 3a: Introduction to Conditional Probability
A tech company has 100 employees who work on various projects. The company records show that:
- 60 employees work on Project A
- 45 employees work on Project B
- 25 employees work on both projects
The HR manager randomly selects one employee. Given that this employee works on Project A, what is the probability they also work on Project B?
Step-by-Step Solution
Define Events
- Let A = “employee works on Project A”
- Let B = “employee works on Project B”
- We need to find P(B|A)
Review the Conditional Probability Formula
P(B|A) = \frac{P(A \cap B)}{P(A)}
Identify Known Values
- Total employees: n = 100
- Number working on A: n_A = 60
- Number working on B: n_B = 45
- Number working on both: n_{A \cap B} = 25
Calculate Probabilities
- P(A) = \frac{60}{100} = 0.6
- P(A \cap B) = \frac{25}{100} = 0.25
Apply the Formula
P(B|A) = \frac{0.25}{0.6} = \frac{25}{60} \approx 0.417
Interpretation
There is about a 41.7% chance that an employee works on Project B, given that they work on Project A. In other words, among the 60 employees who work on Project A, 25 of them (41.7%) also work on Project B.
Teaching Notes
This problem helps students understand:
- Basic Concepts
- The difference between joint probability P(A \cap B) and conditional probability P(B|A)
- Why P(B|A) is not the same as P(A \cap B)
- The role of the denominator P(A) in “restricting the sample space”
- Visual Representation
- The problem can be illustrated with a Venn diagram:
- One circle for Project A (60)
- One circle for Project B (45)
- Overlap shows both projects (25)
- Total space represents all employees (100)
- The problem can be illustrated with a Venn diagram:
- Common Misconceptions
- Students often confuse P(B|A) with P(A \cap B)
- They might think P(B|A) = P(B)
- They might mix up P(B|A) and P(A|B)
- Extensions
- Calculate P(A|B) for comparison
- Find the probability of working on exactly one project
- Consider what happens if projects were independent
Problem 3b: Colored Balls with Replacement and Addition
Question: A box contains 5 red and 3 green balls. One ball is drawn at random, its color is noted, and it is replaced back. Then one more ball of the same color is added. Then a second ball is drawn. What is the probability that both balls drawn are green?
Detailed Solution:
This is a sequential probability problem where the probability of the second event depends on the outcome of the first. Let’s solve it step by step:
Define our events:
- G₁ = first ball is green
- G₂ = second ball is green
- We want P(G₁ ∩ G₂)
Calculate P(G₁):
- Initially: 3 green balls out of 8 total
- P(G_1) = \frac{3}{8}
Calculate P(G₂|G₁):
- If first ball was green:
- After replacement and adding another green: 4 green balls out of 9 total
- P(G_2|G_1) = \frac{4}{9}
- If first ball was green:
Apply the multiplication rule:
P(G_1 \cap G_2) = P(G_1) \cdot P(G_2|G_1) = \frac{3}{8} \cdot \frac{4}{9} = \frac{12}{72} = \frac{1}{6} \approx 0.167 or about 16.7%
Understanding the Solution:
- The probability is relatively low because we need two specific events to occur in sequence
- The addition of a ball of the same color as the first draw creates a dependency between the draws
- If we had simply replaced the first ball without adding another, the draws would have been independent
Problem 4a: Bayesian Analysis of Medical Test Results
A medical test for disease D has the following characteristics:
- Sensitivity (true positive rate): P(T=1|D=1) = 0.95
- Specificity (true negative rate): P(T=0|D=0) = 0.95
- Prior probability (disease prevalence): P(D=1) = 0.001 (1/1000)
- Test result for Alicia: Positive (T=1)
We need to find the posterior probability that Alicia has the disease given a positive test result: P(D=1|T=1)
Derivation Using Bayes’ Theorem
Starting with the conditional probability formula:
P(D=1|T=1) = \frac{P(T=1|D=1)P(D=1)}{P(T=1)}
The denominator P(T=1) can be expanded using the law of total probability:
P(T=1) = P(T=1|D=1)P(D=1) + P(T=1|D=0)P(D=0)
Components Analysis
Prior: P(D=1) = 0.001
- Complement: P(D=0) = 0.999
Likelihood:
- P(T=1|D=1) = 0.95 (sensitivity)
- P(T=0|D=0) = 0.95 (specificity)
- P(T=1|D=0) = 1 - P(T=0|D=0) = 0.05 (false positive rate)
Total Probability (denominator): P(T=1) = (0.95)(0.001) + (0.05)(0.999) = 0.00095 + 0.04995 = 0.0509
Posterior Calculation:
P(D=1|T=1) = \frac{(0.95)(0.001)}{0.0509} = \frac{0.00095}{0.0509} \approx 0.0187
Interpretation
Despite receiving a positive test result, the probability that Alicia has disease D is only about 1.87%. This counterintuitive result is known as the “Bayesian flip” or “base rate fallacy.”
Why is the Probability So Low?
- Base Rate Consideration:
- The very low prevalence (1/1000) means that in a population of 1000 women:
- 1 woman has the disease
- 999 women don’t have the disease
- The very low prevalence (1/1000) means that in a population of 1000 women:
- Test Results in Population:
- Of the 1 woman with disease:
- 0.95 will test positive (true positive)
- Of the 999 women without disease:
- About 50 will test positive (false positives)
- Of the 1 woman with disease:
- Ratio Analysis:
- Among all positive tests (≈51), only about 1 is a true positive
- This explains why P(D=1|T=1) is so low
Teaching Notes
This problem illustrates several important concepts:
The distinction between conditional probabilities:
- P(T=1|D=1) (sensitivity)
- P(D=1|T=1) (positive predictive value)
The crucial role of base rates in Bayesian reasoning
Why medical professionals should:
- Consider prevalence when interpreting test results
- Be cautious about testing asymptomatic patients
- Consider confirmatory testing for positive results
The importance of communicating probabilistic information effectively to patients
The mathematical relationship between:
- Prior probabilities
- Test characteristics (sensitivity/specificity)
- Posterior probabilities
Problem 4b: COVID-19 Test Analysis
Question: Given a COVID-19 test with:
- Sensitivity (P(T=1|D=1)) = 87.5%
- Specificity (P(T=0|D=0)) = 97.5%
- Disease prevalence (P(D=1)) = 10% Find P(D=1|T=1), the probability that a person with a positive test actually has the disease.
Detailed Solution:
This is a perfect application of Bayes’ Theorem. Let’s break it down:
Define our variables:
- D=1: Person has COVID-19
- D=0: Person doesn’t have COVID-19
- T=1: Test is positive
- T=0: Test is negative
Given information:
- P(T=1|D=1) = 0.875 (sensitivity)
- P(T=0|D=0) = 0.975 (specificity)
- P(D=1) = 0.1 (prevalence)
Calculate additional probabilities:
- P(D=0) = 1 - P(D=1) = 0.9
- P(T=1|D=0) = 1 - P(T=0|D=0) = 0.025 (false positive rate)
Apply Bayes’ Theorem: P(D=1|T=1) = \frac{P(T=1|D=1) \cdot P(D=1)}{P(T=1)}
Calculate P(T=1) using the law of total probability: P(T=1) = P(T=1|D=1)P(D=1) + P(T=1|D=0)P(D=0) = (0.875)(0.1) + (0.025)(0.9) = 0.0875 + 0.0225 = 0.11
Now we can complete Bayes’ Theorem: P(D=1|T=1) = \frac{(0.875)(0.1)}{0.11} = \frac{0.0875}{0.11} \approx 0.795 or about 79.5%
Understanding the Result:
This result tells us that even with a positive test, there’s still about a 20.5% chance that the person doesn’t have COVID-19. This might seem surprising, but it’s due to the relatively low prevalence of the disease (10%) in the population. This is known as the base rate fallacy - even a test with good sensitivity and specificity can have a significant false positive rate when the condition being tested for is rare.
Problem 5: Conditional Probability: Marble Drawing with Coin Flip
We have a probability experiment involving two boxes of marbles and a fair coin:
Box X1:
- 2 black marbles
- 3 red marbles
- Total: 5 marbles
Box X2:
- 1 black marble
- 1 red marble
- Total: 2 marbles
A fair coin is flipped to select the box (heads for X1, tails for X2), then one marble is drawn.
Visual Representation
Let’s create a tree diagram to visualize all possible outcomes and their probabilities:
graph TD A[Start] --> B[X1 1/2] A --> C[X2 1/2] B --> D[Black 2/5] B --> E[Red 3/5] C --> F[Black 1/2] C --> G[Red 1/2] D --> H[Black & X1] E --> I[Red & X1] F --> J[Black & X2] G --> K[Red & X2]
Solution
Let’s solve each part step by step:
P(Black | X1)
This is the probability of drawing a black marble given that we selected Box X1.
P(Black | X1) = \frac{\text{Number of black marbles in X1}}{\text{Total marbles in X1}} = \frac{2}{5}
This is a direct probability from the contents of Box X1. We only consider Box X1’s marbles since we’re given that Box X1 was selected.
P(Black and X1)
This is the probability of both selecting Box X1 and drawing a black marble.
P(Black and X1) = P(X1) × P(Black | X1) = \frac{1}{2} \times \frac{2}{5} = \frac{1}{5}
We multiply these probabilities because both events must occur (intersection).
P(Black)
This is the total probability of drawing a black marble from either box. We use the law of total probability:
P(Black) = P(X1) × P(Black | X1) + P(X2) × P(Black | X2) = \frac{1}{2} \times \frac{2}{5} + \frac{1}{2} \times \frac{1}{2} = \frac{1}{5} + \frac{1}{4} = \frac{4}{20} + \frac{5}{20} = \frac{9}{20}
P(X1 | Black)
This is the probability that we selected Box X1 given that we drew a black marble. We use Bayes’ Theorem:
P(X1 | Black) = \frac{P(Black | X1) \times P(X1)}{P(Black)} = \frac{\frac{2}{5} \times \frac{1}{2}}{\frac{9}{20}} = \frac{\frac{1}{5}}{\frac{9}{20}} = \frac{4}{9}
Key Concepts Demonstrated
Conditional Probability: Shown in P(Black | X1), where we consider probability within a subset of outcomes
Multiplication Rule: Used in finding P(Black and X1), where we multiply probabilities of sequential events
Law of Total Probability: Applied in finding P(Black), where we consider all possible ways an event can occur
Bayes’ Theorem: Used to find P(X1 | Black), reversing the direction of conditioning
Problem 6: Probability of Intersecting Events and Independence Analysis
You roll a fair die. What is the probability of getting an even number (A) and the number greater or equal to 4 (B)? Are events A and B independent?
Let’s explore this problem by first understanding what each event means, then calculating their probabilities both separately and together, and finally examining their independence.
Understanding the Events
Let’s first identify what numbers satisfy each condition on a standard six-sided die:
Event A (Even numbers): {2, 4, 6} Event B (Numbers ≥ 4): {4, 5, 6}
We can visualize this using a Venn diagram:
graph TD A[Start] --> B["R (3/6)"] A --> C["B (2/6)"] A --> D["Y (1/6)"] B --> E["B (2/5)"] B --> F["Y (1/5)"] B --> G["R (2/5)"] C --> H["R (3/5)"] C --> I["Y (1/5)"] C --> J["B (1/5)"] D --> K["R (3/5)"] D --> L["B (2/5)"] D --> M["Y (0/5)"] E --> N["RB (Success)"] F --> O["RY (Success)"] G --> P["RR (Fail)"] H --> Q["BR (Success)"] I --> R["BY (Success)"] J --> S["BB (Fail)"] K --> T["YR (Success)"] L --> U["YB (Success)"] M --> V["YY (Fail)"]
Calculating P(A ∩ B)
To find the probability of getting both an even number AND a number greater than or equal to 4:
- First, let’s identify the numbers that satisfy both conditions:
- Must be even AND ≥ 4
- Numbers that satisfy both: {4, 6}
- Therefore: P(A ∩ B) = \frac{\text{number of favorable outcomes}}{\text{total number of possible outcomes}} = \frac{2}{6} = \frac{1}{3}
Testing for Independence
To determine if events A and B are independent, we need to check if: P(A ∩ B) = P(A) × P(B)
Let’s calculate each probability:
P(A) = P(even number) = \frac{3}{6} = \frac{1}{2}
- Favorable outcomes: {2, 4, 6}
P(B) = P(number ≥ 4) = \frac{3}{6} = \frac{1}{2}
- Favorable outcomes: {4, 5, 6}
P(A) × P(B) = \frac{1}{2} \times \frac{1}{2} = \frac{1}{4}
Compare:
- P(A ∩ B) = \frac{1}{3}
- P(A) × P(B) = \frac{1}{4}
Since \frac{1}{3} \neq \frac{1}{4}, events A and B are NOT independent.
Understanding the Meaning of Dependence
This dependence makes intuitive sense because:
- Knowing a number is even affects the probability it’s ≥ 4
- If we know we rolled an even number, there are three possibilities (2, 4, 6)
- Within these possibilities, the probability of getting ≥ 4 is \frac{2}{3}, not \frac{1}{2}
This illustrates an important principle: events can be dependent even when they don’t seem directly related. The overlap in their outcome spaces creates a subtle but measurable dependence.
Teaching Extension
To deepen understanding, consider this question: How would the independence calculation change if we used “numbers less than 4” instead of “numbers greater than or equal to 4”? This variation helps illustrate how the structure of event spaces influences their independence.
Problem 7: The Monty Hall Problem - Two Solution Approaches
Let’s analyze this fascinating probability problem that has puzzled many people, including mathematicians. We’ll solve it using both a tree diagram and conditional probability to build a complete understanding.
Problem Statement
The Monty Hall problem:
- There are three doors: behind one is a car, behind the others are goats
- You pick a door
- Monty Hall (who knows what’s behind each door) opens another door, always showing a goat
- You’re offered the chance to switch to the remaining door
- Question: Should you switch? What’s the probability of winning if you switch vs. if you stay?
Approach 1: Tree Diagram Solution
Let’s visualize all possible scenarios:
graph TD A[Initial Choice] --> B[Car 1/3] A --> C[Goat1 1/3] A --> D[Goat2 1/3] B --> E[Monty Shows Goat2] B --> F[Monty Shows Goat1] C --> G[Monty Must Show Goat2] D --> H[Monty Must Show Goat1] E --> I[Switch loses] F --> J[Switch loses] G --> K[Switch wins] H --> L[Switch wins] style I fill:#ffcccc style J fill:#ffcccc style K fill:#ccffcc style L fill:#ccffcc
Analyzing the outcomes: 1. If you initially picked the car (1/3 chance): - Monty can show either goat - Switching loses
- If you initially picked a goat (2/3 chance):
- Monty must show the other goat
- Switching wins
Therefore:
- P(win if stay) = \frac{1}{3}
- P(win if switch) = \frac{2}{3}
Approach 2: Conditional Probability Solution
Let’s use Bayes’ Theorem to solve this. Define events:
- C₁: Car is behind Door 1 (your initial choice)
- M₂: Monty opens Door 2 showing a goat
P(Car behind Door 3 | Monty opens Door 2) = ?
We can write: P(Car in 3 | M₂) = \frac{P(M₂|Car in 3) \times P(Car in 3)}{P(M₂)}
Let’s calculate each term:
- P(Car in 3) = \frac{1}{3} (prior probability)
- P(M₂|Car in 3) = 1 (Monty must open Door 2)
- P(M₂) = P(M₂|Car in 1) × P(Car in 1) + P(M₂|Car in 2) × P(Car in 2) + P(M₂|Car in 3) × P(Car in 3) = \frac{1}{2} \times \frac{1}{3} + 0 \times \frac{1}{3} + 1 \times \frac{1}{3} = \frac{1}{6} + \frac{1}{3} = \frac{1}{2}
Therefore:
P(Car in 3 | M₂) = \frac{1 \times \frac{1}{3}}{\frac{1}{2}} = \frac{2}{3}
Key Insights
- Why Intuition Fails:
- People often think it’s 50-50 after Monty opens a door
- This ignores the crucial fact that Monty’s choice is informed, not random
- His action provides information that should update our probabilities
- Information Value:
- Monty’s choice is constrained (must show a goat)
- This constraint carries information
- The probability shifts from the initial \frac{1}{3} to \frac{2}{3} for switching
- Simulation Verification: We could write a simple program to simulate this game thousands of times, and it would confirm these probabilities. The most convincing evidence is often seeing the results empirically.
Problem 8: The Bertrand Box Paradox - A Teaching Analysis
Understanding the Problem Setup
First, let’s clearly state what we’re dealing with:
We have three boxes:
- Box 1: Contains two gold coins (GG)
- Box 2: Contains two silver coins (SS)
- Box 3: Contains one gold and one silver coin (GS)
The process:
- We randomly select a box
- We randomly draw one coin from the chosen box
- If we see a gold coin, what’s the probability it came from the gold-only box?
Most people intuitively answer \frac{1}{2}, but let’s discover why this isn’t correct.
Approach 1: Tree Diagram Analysis
Let’s visualize all possible paths and outcomes:
graph TD A[Start] --> B[Box GG 1/3] A --> C[Box SS 1/3] A --> D[Box GS 1/3] B --> E[Draw G 1] C --> F[Draw S 1] D --> G[Draw G 1/2] D --> H[Draw S 1/2] E --> I[Saw Gold] G --> I[Saw Gold] F --> J[Saw Silver] H --> J[Saw Silver] style I fill:#FFD700 style J fill:#C0C0C0
Following the paths where we see gold:
- From Box GG (probability = \frac{1}{3} \times 1 = \frac{1}{3})
- From Box GS (probability = \frac{1}{3} \times \frac{1}{2} = \frac{1}{6})
Therefore:
- Total probability of seeing gold = \frac{1}{3} + \frac{1}{6} = \frac{1}{2}
- Given we saw gold, probability it came from Box GG = \frac{\frac{1}{3}}{\frac{1}{2}} = \frac{2}{3}
Approach 2: Bayes’ Theorem Solution
Let’s solve this formally using Bayes’ Theorem:
P(Box GG | Gold) = \frac{P(Gold|Box GG) \times P(Box GG)}{P(Gold)}
Let’s calculate each component:
- P(Gold|Box GG) = 1 (certainty of drawing gold)
- P(Box GG) = \frac{1}{3} (equal box probabilities)
- P(Gold) = \frac{1}{3} \times 1 + \frac{1}{3} \times 0 + \frac{1}{3} \times \frac{1}{2} = \frac{1}{2}
Putting it together:
P(Box GG | Gold) = \frac{1 \times \frac{1}{3}}{\frac{1}{2}} = \frac{2}{3}
Why This Is Counterintuitive
The reason many people get this wrong reveals interesting aspects of how we think about probability:
The Setup Trick: People often think, “If I see gold, it must be from either Box GG or Box GS, so it’s 50-50.” This ignores the fact that Box GG has twice the opportunity to show gold.
Prior vs Posterior: The problem shows how observing evidence (seeing gold) updates our prior probability (\frac{1}{3}) to a posterior probability (\frac{2}{3}).
Sample Space Structure: Box GG contributes more gold coins to the total sample space of possible draws than Box GS does.
A Teaching Analogy
Think of it this way: Imagine three people named GG, SS, and GS.
- GG always raises both hands when asked
- SS never raises hands
- GS raises one hand
If you see a raised hand randomly, it’s more likely to belong to GG (who contributes two hands) than GS (who contributes only one).
Extension for Deeper Understanding
To reinforce this concept, consider: How would the probabilities change if we had:
- Three coins in each box?
- Different prior probabilities for selecting each box?
- The ability to see both coins but only after selecting a box?
17.14 Appendix 1. Advanced Counting in Probability: A Student Guide (*)
Poker Hands: A Window into Complex Counting
Poker hands provide some of the most interesting examples for understanding counting in probability. They’re perfect for learning because they combine multiple counting principles and help us understand common pitfalls. Let’s explore these concepts step by step.
Understanding Our Sample Space
Before we dive into specific hands, let’s understand what we’re working with. A poker hand consists of 5 cards drawn from a standard 52-card deck. Understanding the sample space is crucial because it forms the foundation of all our probability calculations.
The total number of possible poker hands represents how many different ways we can select 5 cards from 52 cards, where the order doesn’t matter (getting ace-king-queen is the same hand as getting king-queen-ace), we can’t reuse cards (we can’t have the ace of spades twice in our hand), and we must take exactly 5 cards (not more, not less).
This means we’re dealing with combinations. Let’s calculate this step by step:
\binom{52}{5} = \frac{52!}{5!(52-5)!} = \frac{52!}{5!(47)!} = \frac{52 \cdot 51 \cdot 50 \cdot 49 \cdot 48}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 2,598,960
This number, 2,598,960, will be our denominator for calculating the probability of any specific poker hand.
Understanding Two Pairs: A Careful Counting Approach
Two pairs is one of the most interesting hands for understanding counting principles. To get two pairs, we need:
- Two cards of one rank
- Two cards of another rank
- One card of a third rank (the kicker)
Let’s build this hand step by step, being careful to understand each choice we make:
First, let’s select our ranks. We might think we should just choose two ranks from 13 for our pairs using \binom{13}{2}, but this approach hides some important subtleties. Instead, let’s think about the actual process of constructing the hand:
- We have 13 possible ranks for our first pair
- After choosing the first pair’s rank, we have 12 ranks left for our second pair
- After choosing both pair ranks, we have 11 ranks left for our kicker
For each rank we’ve chosen, we need to select specific cards:
- For our first pair: we choose 2 cards from the 4 available cards of that rank: \binom{4}{2} = 6 ways
- For our second pair: again \binom{4}{2} = 6 ways
- For our kicker: we choose 1 card from 4: \binom{4}{1} = 4 ways
Now, here’s where many students get confused: Does it matter which pair we count “first” and which we count “second”? The answer reveals a deep truth about counting in probability.
Let’s use a concrete example. Suppose we want two pairs with Aces and Kings, and a Two as our kicker. We could:
- Choose Aces as our first pair, then Kings as our second pair
- Choose Kings as our first pair, then Aces as our second pair
These lead to the exact same hand type, but we need to count both paths to this hand because they represent different ways of constructing it. It’s similar to how we can make a sandwich by putting either cheese slice on first - the order of construction matters for counting all possibilities, even though the final sandwich is the same.
This is why our final formula multiplies all these independent choices:
13 (first pair rank) × 12 (second pair rank) × 11 (kicker rank) × \binom{4}{2} (first pair cards) × \binom{4}{2} (second pair cards) × \binom{4}{1} (kicker card)
Each term represents a separate decision we make in constructing the hand. While the order of these decisions doesn’t affect the final hand we get, we need to account for all possible ways to arrive at each hand to get the correct total.
Let’s calculate the total probability:
P(\text{two pairs}) = \frac{13 \cdot 12 \cdot 11 \cdot \binom{4}{2} \cdot \binom{4}{2} \cdot \binom{4}{1}}{2,598,960} = \frac{123,552}{2,598,960} \approx 0.0475
This means about 4.75% of all possible poker hands are two pairs.
Understanding Full House: A Different Counting Challenge
A full house gives us a perfect contrast to two pairs. While both hands involve multiple cards of the same rank, the counting process reveals important differences in how we approach probability problems.
In a full house, we need: - Three cards of one rank (called “three of a kind”) - Two cards of another rank (a pair)
Let’s think about why counting a full house is different from counting two pairs. With two pairs, we had to be careful about the order of selecting our pairs. With a full house, we have a natural order: we must choose our three of a kind first (because it’s distinct from the pair), then choose our pair.
Let’s count step by step:
- For the three of a kind:
- Choose the rank: 13 possible ranks
- Choose which three cards of that rank: \binom{4}{3} = 4 ways
- For the pair:
- Choose the rank: 12 remaining ranks
- Choose which two cards of that rank: \binom{4}{2} = 6 ways
Multiplying these together:
13 (three of a kind rank) × \binom{4}{3} (specific three cards) × 12 (pair rank) × \binom{4}{2} (specific pair cards)
= 13 \cdot 4 \cdot 12 \cdot 6 = 3,744
Therefore:
P(\text{full house}) = \frac{3,744}{2,598,960} \approx 0.0014
About 0.14% of all poker hands are full houses, making them significantly rarer than two pairs (4.75%). This makes intuitive sense - it’s harder to get three of the same rank plus a pair than to get two pairs plus a kicker.
The Birthday Problem: A Beautiful Probability Surprise
The birthday problem provides a fascinating connection to our poker probability work, while teaching us something profound about the nature of counting. The classic question is: “How many people need to be in a room for there to be a 50% chance that at least two share a birthday?”
Most people guess around 183 (half of 365), but the actual answer is just 23 people! Let’s understand why this connects to our previous counting work and why the answer is so surprising.
First, let’s think about what makes this problem different from our poker calculations:
- In poker, we were looking for specific combinations (like two pairs)
- In the birthday problem, we’re looking for any match at all
This is similar to the difference between asking: - “What’s the probability of drawing the ace of spades and king of hearts specifically?” - “What’s the probability of drawing any two cards of different ranks?”
The second question has many more ways to succeed.
Let’s solve the birthday problem step by step:
- First, it’s easier to calculate the probability of no matches
- Then we can subtract from 1 to get the probability of at least one match
For 23 people, we calculate no matches like this: - First person can have any birthday: \frac{365}{365} - Second person needs a different birthday: \frac{364}{365} - Third person needs a different birthday: \frac{363}{365} And so on until person 23.
This gives us:
P(\text{no matches}) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdot ... \cdot \frac{343}{365}
= \frac{365!}{(365-23)! \cdot 365^{23}} \approx 0.492
Therefore:
P(\text{at least one match}) = 1 - 0.492 \approx 0.508
This teaches us something profound about probability: when we’re looking for any match among many possibilities (like in the birthday problem), we often get much higher probabilities than when we’re looking for specific matches (like in poker hands).
Lottery Mathematics
Let’s apply everything we’ve learned to understand lottery probabilities. Consider a typical “6/49” lottery where players choose 6 numbers from 1-49. This gives us a perfect opportunity to apply our counting principles in a real-world context.
The fundamental question is: What’s the probability of winning the jackpot (matching all 6 numbers)?
This is a combination problem because: - Order doesn’t matter (matching 1-2-3-4-5-6 is the same as matching 6-5-4-3-2-1) - We can’t use the same number twice - We need exactly 6 numbers
Therefore:
P(\text{jackpot}) = \frac{1}{\binom{49}{6}} = \frac{1}{13,983,816}
This tiny probability (about 0.0000000715) shows why lottery wins are so rare. But modern lotteries have multiple prize tiers, which gives us a chance to explore more interesting probability calculations.
Consider matching 5 numbers plus a bonus number. For this, we need to: 1. Match 5 of the 6 winning numbers: \binom{6}{5} ways to choose which 5 2. Match 1 of the remaining 43 numbers with the bonus: \binom{43}{1} ways
Therefore:
P(\text{5 + bonus}) = \frac{\binom{6}{5} \cdot \binom{43}{1}}{\binom{49}{6}} = \frac{6 \cdot 43}{13,983,816} \approx 0.0000184
This shows us how breaking down complex probability problems into simpler parts helps us solve them systematically.
Appendix 2. Alternative Approaches to Poker Hand Probabilities (*)
Understanding different ways to calculate the same probability deepens our insight into counting principles. Let’s explore several methods for finding the probabilities of two pairs and full house, seeing how each approach highlights different aspects of the problem.
Multiple Paths to Two Pairs Probability
Let’s start with two pairs. We’ve seen one method, but there are several valid approaches:
Method 1: Sequential Selection (Our Original Approach) We build the hand step by step: 1. Choose first pair’s rank: 13 ways 2. Choose second pair’s rank: 12 ways 3. Choose kicker’s rank: 11 ways 4. Choose specific cards for first pair: \binom{4}{2} ways 5. Choose specific cards for second pair: \binom{4}{2} ways 6. Choose specific card for kicker: \binom{4}{1} ways
This gives us: P(\text{two pairs}) = \frac{13 \cdot 12 \cdot 11 \cdot \binom{4}{2} \cdot \binom{4}{2} \cdot \binom{4}{1}}{2,598,960}
Method 2: Complementary Counting We can find two pairs probability by subtracting the probability of all other possible hands from 1. However, this is more complex than direct counting because we need to know the probabilities of all other poker hands. Still, it serves as a good verification:
P(\text{two pairs}) = 1 - P(\text{high card}) - P(\text{one pair}) - P(\text{three of a kind}) - P(\text{straight}) - P(\text{flush}) - P(\text{full house}) - P(\text{four of a kind}) - P(\text{straight flush})
Method 3: Using Permutations with Adjustment We can use permutations and then adjust for overcounting:
- Choose an ordered arrangement of two ranks for pairs: P(13,2) = 13 \cdot 12
- Choose kicker rank: 11 ways
- Choose specific cards for pairs and kicker: \binom{4}{2} \cdot \binom{4}{2} \cdot \binom{4}{1}
- Divide by 2 to account for the fact that the order of pairs doesn’t matter
This gives: P(\text{two pairs}) = \frac{P(13,2) \cdot 11 \cdot \binom{4}{2} \cdot \binom{4}{2} \cdot \binom{4}{1}}{2 \cdot 2,598,960}
Method 4: Combination-Based Approach with Multiplication Principle We can separate rank selection from card selection:
- First, select three ranks: \binom{13}{3} ways
- From these three ranks, designate two for pairs and one for kicker: \binom{3}{2} ways
- For each pair rank, select two cards: \binom{4}{2} \cdot \binom{4}{2} ways
- For the kicker rank, select one card: \binom{4}{1} ways
This gives us: P(\text{two pairs}) = \frac{\binom{13}{3} \cdot \binom{3}{2} \cdot \binom{4}{2} \cdot \binom{4}{2} \cdot \binom{4}{1}}{2,598,960}
Alternative Approaches to Full House Probability
The full house probability can also be calculated in several ways:
Method 1: Direct Sequential Selection (Our Original Approach) 1. Choose rank for three of a kind: 13 ways 2. Choose specific three cards: \binom{4}{3} ways 3. Choose rank for pair: 12 ways 4. Choose specific two cards: \binom{4}{2} ways
Leading to: P(\text{full house}) = \frac{13 \cdot \binom{4}{3} \cdot 12 \cdot \binom{4}{2}}{2,598,960}
Method 2: Using Combinations with Distribution We can think about it as: 1. Choose two ranks from 13: \binom{13}{2} ways 2. Designate which rank gets three cards: 2 ways (since either rank could be the three of a kind) 3. Choose specific cards: \binom{4}{3} \cdot \binom{4}{2} ways
This gives: P(\text{full house}) = \frac{\binom{13}{2} \cdot 2 \cdot \binom{4}{3} \cdot \binom{4}{2}}{2,598,960}
Method 3: Using the Multiplication Principle with Sets Think about constructing the hand as selecting two sets of cards: 1. First set: three cards of the same rank from 13 ranks - Choose rank: 13 ways - Choose three cards: \binom{4}{3} ways 2. Second set: two cards of the same rank from 12 remaining ranks - Choose rank: 12 ways - Choose two cards: \binom{4}{2} ways
This yields the same result: P(\text{full house}) = \frac{13 \cdot \binom{4}{3} \cdot 12 \cdot \binom{4}{2}}{2,598,960}
Each method illuminates different aspects of the counting process: - Sequential selection helps us understand the step-by-step construction of hands - Combination-based approaches highlight the underlying structure of the selections - Permutation-based methods with adjustment show how overcounting can be handled systematically
The fact that all these methods yield the same result serves as a powerful verification tool. When solving complex probability problems, being able to approach the solution in multiple ways not only confirms our answer but also deepens our understanding of the underlying counting principles.
Appendix 3: Occupancy Problems and Statistical Physics (*)
Understanding how objects can be distributed into containers forms the foundation for both probability theory and statistical mechanics. Let’s explore this connection, starting with basic counting principles and building up to physical applications.
The Basic Occupancy Problem
Imagine we have n identical balls and k distinct boxes. How many ways can we distribute the balls? This simple question leads us to three fundamentally different scenarios that mirror important physical systems:
- Unrestricted occupancy (Bose-Einstein statistics)
- Each box can hold any number of balls
- The balls are indistinguishable
- Like photons in quantum states
- Maximum one per box (Fermi-Dirac statistics)
- Each box can hold at most one ball
- The balls are indistinguishable
- Like electrons in atomic orbitals
- All arrangements count separately (Maxwell-Boltzmann statistics)
- Each box can hold any number of balls
- The balls are distinguishable
- Like classical gas molecules
Stars and Bars: Understanding Unrestricted Occupancy
Let’s start with the Bose-Einstein case. The “stars and bars” method provides a beautiful way to visualize and count these arrangements.
Imagine n=5 balls and k=3 boxes. We can represent any arrangement as a sequence of stars and bars:
- Stars (*) represent balls
- Bars (|) separate different boxes
For example:
- ** | ** | * represents 2 balls in first box, 2 in second, 1 in third
- ***** | | represents all 5 balls in first box, none in others
- | ***** | represents all 5 balls in middle box
The key insight is that we need: - n stars (one for each ball) - k-1 bars (to create k sections)
Therefore, we’re really just choosing positions for the k-1 bars among n+(k-1) total positions. This gives us:
\text{Number of arrangements} = \binom{n+k-1}{k-1} = \binom{n+k-1}{n}
From Counting to Physics
Now let’s see how these counting principles reveal deep physical truths:
Bose-Einstein Statistics (Unrestricted, Indistinguishable)
- Think of photons in a laser
- Many particles can occupy same energy state
- Total arrangements: \binom{n+k-1}{k-1}
- Example: Light in a cavity
Fermi-Dirac Statistics (Restricted, Indistinguishable)
- Think of electrons in atoms
- Maximum one particle per state
- Total arrangements: \binom{k}{n} if n \leq k, 0 otherwise
- Example: Electron configuration in atoms
Maxwell-Boltzmann Statistics (Classical, Distinguishable)
- Think of gas molecules
- Particles are distinct
- Total arrangements: k^n
- Example: Air molecules in a room
An Intuitive Bridge to Physics
To understand why these statistics matter, consider three real scenarios:
Photons in a Laser (Bose-Einstein) Imagine shining a laser into a mirror cavity. Photons are happy to bunch together in the same quantum state - they’re “social particles.” This is why lasers can produce intense, coherent light.
Electrons in an Atom (Fermi-Dirac) Electrons are “antisocial” - they refuse to share quantum states (Pauli exclusion principle). This explains atomic structure and why matter is mostly empty space.
Gas Molecules in a Room (Maxwell-Boltzmann) Air molecules bounce around randomly, and we can tell them apart (in principle). This gives us the familiar gas laws and diffusion.
The Power of the Star and Bars Method
The stars and bars visualization helps us understand more complex problems. For instance, if we have restrictions on box occupancy:
- At least one ball per box:
- First put one ball in each box
- Then distribute remaining balls freely
- Formula: \binom{n-k+k-1}{k-1} = \binom{n-1}{k-1}
- Maximum capacity per box:
- Use inclusion-exclusion principle
- Subtract arrangements that violate constraints
- More complex but same underlying principle
Connection to Partition Problems
This same framework helps us solve other important problems:
- Integer Partitions How many ways can we write n as a sum of positive integers?
- Like distributing n balls into unlimited boxes
- Each box represents a different term in the sum
- Compositions How many ways can we write n as an ordered sum?
- Like distinguishable boxes
- Order matters here
This connection between simple counting and profound physical phenomena shows the deep unity of mathematics and physics. The same principles that help us count poker hands and lottery combinations govern the behavior of the universe at its most fundamental level.