Introduction to Probability

Examples, Intuition, and Interactive Demos

Session roadmap

Probability language ──→ Bayes updating ──→ Test accuracy ──→ Simulation variability
    events + conditioning    priors + evidence   prevalence effects   simulation averages
           ╲                                      ╱
            ╲──── NBC: Bayes for classification ─╱

Five topics. One thread: how to reason consistently under uncertainty.

Pre-reading check

Hands up: true or false?

  1. P(not A) = 1 − P(A)    true

  2. P(A | B) means: probability of B given A”    false

  3. “A test with 99% sensitivity means a positive result is 99% likely correct”

    false

Warm-up: prior and posterior

Warm-up (5 min)

  1. Which number is your starting belief?
  2. Which numbers describe test quality?
  3. Which number changes after a positive result?
  4. Why can the updated probability still be low?

Probability language

What you must retain

  • Probabilities are numbers between 0 and 1
  • P(not A) = 1 − P(A) — complement rule
  • P(A | B) means: probability of A given B
  • Independence: knowing B does not change the probability of A

Jeffreys: axioms govern reasoning, not measurement

Jeffreys’ definition: probability is the degree of confidence that we may reasonably have in a proposition.

Frequentist definitions (von Mises, Fisher) claim probability is a frequency — measurement is built into the definition.

Jeffreys’ framework claims nothing about measurement — only about reasoning:

“Once you have accepted any probabilities, you must reason consistently with them.”

Bayes’ theorem follows as a consequence of consistent reasoning — not as an arbitrary formula.

Three rules you will use today

\[ \text{P}(\neg A) \;=\; 1 - \text{P}(A) \quad\quad \leftarrow \text{complement rule} \]

\[ \text{P}(A \mid B) \;=\; \frac{\text{P}(A \cap B)}{\text{P}(B)} \quad\quad \leftarrow \text{conditional probability ("given")} \]

\[ \text{P}(A \mid B) \;=\; \text{P}(A) \quad\quad \leftarrow \text{independence: B tells you nothing about A} \]

These three rules are sufficient for everything today.

Story: two sacks of coins

There are two sacks of coins.

  • Sack 1: 150 gold, 50 silver   (75% gold)
  • Sack 2: 100 gold, 200 silver   (33% gold)

A blindfolded person picks one sack at random and draws one coin. You observe a gold coin.

Question: which sack is now more plausible?

Predict the direction first

Say this out loud before touching the app:

  • “Gold is more common in Sack 1.”
  • “So observing gold should increase my belief in Sack 1.”

Coin Sacks App

Task

  1. Start with equal priors. Which sack is favored after seeing gold?
  2. Verify: P(Sack 1 | Gold) ≈ 0.692.
  3. Change the prior so Sack 2 is preferred 2:1. Does the conclusion change?
  4. Make the two sacks nearly identical (same coin mix). What happens to the update?

Explain the update in words before reading numbers.

Bayes’ theorem: annotated

\[ \underbrace{\text{P}(A \mid B)}_{\text{posterior}} \;=\; \frac{ \underbrace{\text{P}(B \mid A)}_{\text{likelihood}} \;\times\; \underbrace{\text{P}(A)}_{\text{prior}} }{ \underbrace{\text{P}(B)}_{\text{evidence}} } \]

Story: screening for a rare condition

A financial fraud detection system has:

  • Prevalence of fraud: 0.2% (2 in every 1,000 transactions)
  • Sensitivity: 99% (correctly flags 99% of real fraud)
  • Specificity: 99% (correctly clears 99% of legitimate transactions)

Question: if the system flags a transaction as fraudulent, how likely is it actually fraud?

Positive Test App

Task

  1. Set prevalence = 0.2%, sensitivity = 99%, specificity = 99%.
  2. Record the result. Compare to your prediction from before.
  3. Increase prevalence to 2%. What changes?
  4. Reset. Increase specificity to 99.9%. What changes?
  5. Write one sentence starting: “A flagged transaction means…”

What to say about a positive result

Use language like this:

  • “The test is good, but fraud is rare.”
  • “A flagged transaction increases the probability, but it may still be far from certain.”
  • “To know how likely it is after a positive result, I need prevalence and test quality.”

Avoid saying:

  • “The test is 99% accurate, so a flagged transaction is 99% likely fraud.”

The fraction of true positives among all positives (PPV) depends critically on prevalence.

Story: classifying a car’s origin

Cars93 dataset: 93 cars. Predict origin (USA / non-USA) from one feature: Man.trans.avail (Yes / No).

Man.trans.avail P(feature | USA) P(feature | non-USA)
No 0.542 0.133
Yes 0.458 0.867

Prior: P(USA) = 0.516. For a new car with Man.trans.avail = No:

  • USA score \(\;\propto\;\) 0.516 × 0.542 = 0.280
  • non-USA score \(\;\propto\;\) 0.484 × 0.133 = 0.064

Predicted: USA (P ≈ 81%).

Classifier App — Cars93

Task — select Origin + Man.trans.avail

  1. Read the prior probabilities from the output. What do they represent?
  2. Using the likelihood table, verify the prediction for a car with Man.trans.avail = No. Compute the scores by hand.
  3. Set training set to 70%. Find sensitivity and specificity in the output. What do these numbers tell you about the classifier?
  4. Use Shuffle Data several times. What changes: priors, sensitivity, specificity? Why? What does this variability remind you of from earlier today?

What is Naive Bayes?

One sentence:

Naive Bayes is Bayes’ theorem applied once per feature, assuming features are independent given the class.

Story: which hospital shows more extreme days?

Two hospitals record the proportion of boys born each day:

  • Large hospital: ~45 births per day
  • Small hospital: ~15 births per day

Question: which hospital more often records days with more than 60% boys?

\[ \bar{X}_n \;\xrightarrow{\;n \to \infty\;}\; \mu \quad\quad \text{(Law of Large Numbers)} \]

Predict before simulating.

Hospital Simulation App

Task

  1. Predict first: large or small hospital — which shows more extreme days?
  2. Run the simulation. Was your prediction correct?
  3. Increase the number of simulated days. What stabilizes?
  4. Change the threshold (60% → 80%). What changes?
  5. Explain using the words variability and sample size.

Key ideas

Topic What to retain
Probability language P(A|B) is conditional; independence means B tells you nothing about A
Jeffreys Axioms govern reasoning, not measurement; Bayes follows from consistency
Bayes’ theorem Posterior ∝ likelihood × prior; P(B) normalizes
Sensitivity / specificity Test accuracy ≠ PPV; prevalence dominates at low base rates
Naive Bayes Bayes per feature, independence assumed; “naive” is intentional
Law of Large Numbers Sample means converge to μ; small samples fluctuate more

Exit problem (pairs, 5 min)

A rapid test for a rare infection has:

  • Prevalence: 0.5%
  • Sensitivity: 98%
  • Specificity: 95%

You test 1,000 people.

  1. How many false positives do you expect?
  2. What is the probability a positive result is a true infection (PPV)?
  3. If you repeated this tomorrow with 1,000 new people, would the numbers be identical? Why not?