Warm-up: prior and posterior
Warm-up (5 min)
- Which number is your starting belief?
- Which numbers describe test quality?
- Which number changes after a positive result?
- Why can the updated probability still be low?
Jeffreys: axioms govern reasoning, not measurement
Jeffreys’ definition: probability is the degree of confidence that we may reasonably have in a proposition.
Frequentist definitions (von Mises, Fisher) claim probability is a frequency — measurement is built into the definition.
Jeffreys’ framework claims nothing about measurement — only about reasoning:
“Once you have accepted any probabilities, you must reason consistently with them.”
Bayes’ theorem follows as a consequence of consistent reasoning — not as an arbitrary formula.
Three rules you will use today
\[
\text{P}(\neg A) \;=\; 1 - \text{P}(A)
\quad\quad \leftarrow \text{complement rule}
\]
\[
\text{P}(A \mid B) \;=\; \frac{\text{P}(A \cap B)}{\text{P}(B)}
\quad\quad \leftarrow \text{conditional probability ("given")}
\]
\[
\text{P}(A \mid B) \;=\; \text{P}(A)
\quad\quad \leftarrow \text{independence: B tells you nothing about A}
\]
These three rules are sufficient for everything today.
Story: two sacks of coins
There are two sacks of coins.
- Sack 1: 150 gold, 50 silver (75% gold)
- Sack 2: 100 gold, 200 silver (33% gold)
A blindfolded person picks one sack at random and draws one coin. You observe a gold coin.
Question: which sack is now more plausible?
Bayes’ theorem: annotated
\[
\underbrace{\text{P}(A \mid B)}_{\text{posterior}}
\;=\;
\frac{
\underbrace{\text{P}(B \mid A)}_{\text{likelihood}}
\;\times\;
\underbrace{\text{P}(A)}_{\text{prior}}
}{
\underbrace{\text{P}(B)}_{\text{evidence}}
}
\]
Story: screening for a rare condition
A financial fraud detection system has:
- Prevalence of fraud: 0.2% (2 in every 1,000 transactions)
- Sensitivity: 99% (correctly flags 99% of real fraud)
- Specificity: 99% (correctly clears 99% of legitimate transactions)
Question: if the system flags a transaction as fraudulent, how likely is it actually fraud?
Positive Test App
Task
- Set prevalence = 0.2%, sensitivity = 99%, specificity = 99%.
- Record the result. Compare to your prediction from before.
- Increase prevalence to 2%. What changes?
- Reset. Increase specificity to 99.9%. What changes?
- Write one sentence starting: “A flagged transaction means…”
Story: classifying a car’s origin
Cars93 dataset: 93 cars. Predict origin (USA / non-USA) from one feature: Man.trans.avail (Yes / No).
| No |
0.542 |
0.133 |
| Yes |
0.458 |
0.867 |
Prior: P(USA) = 0.516. For a new car with Man.trans.avail = No:
- USA score \(\;\propto\;\) 0.516 × 0.542 = 0.280
- non-USA score \(\;\propto\;\) 0.484 × 0.133 = 0.064
Predicted: USA (P ≈ 81%).
Classifier App — Cars93
Task — select Origin + Man.trans.avail
- Read the prior probabilities from the output. What do they represent?
- Using the likelihood table, verify the prediction for a car with
Man.trans.avail = No. Compute the scores by hand.
- Set training set to 70%. Find sensitivity and specificity in the output. What do these numbers tell you about the classifier?
- Use Shuffle Data several times. What changes: priors, sensitivity, specificity? Why? What does this variability remind you of from earlier today?
Story: which hospital shows more extreme days?
Two hospitals record the proportion of boys born each day:
- Large hospital: ~45 births per day
- Small hospital: ~15 births per day
Question: which hospital more often records days with more than 60% boys?
\[
\bar{X}_n \;\xrightarrow{\;n \to \infty\;}\; \mu
\quad\quad \text{(Law of Large Numbers)}
\]
Predict before simulating.
Exit problem (pairs, 5 min)
A rapid test for a rare infection has:
- Prevalence: 0.5%
- Sensitivity: 98%
- Specificity: 95%
You test 1,000 people.
- How many false positives do you expect?
- What is the probability a positive result is a true infection (PPV)?
- If you repeated this tomorrow with 1,000 new people, would the numbers be identical? Why not?