11 Problems
11.1 Bayes Theorem
11.1.1 Task 1
Consider the situation where a patient exhibits symptoms that are typical for a particular disease. The disease is relatively rare in this population, with a prevalence of 0.3% (i.e. it affects 3 out of every 1000 persons). A pharmaceutical company developed a diagnostic test that costs €100, which has a reported sensitivity of 90% (i.e. the probability of testing positive, given the patient has the disease). Based on historical data, the company computed that there is an overall probability of 7% of testing positive (and 93% of testing negative). Should the patient spend €100?
The following probabilities are available:
- \(P(+) = 7\%\)
- \(P(+ | D) = 90\%\)
- \(P(D) = 0.3\%\)
where \(+\) stands for “testing positive” and \(D\) for “having the disease.”
According to Bayes’ Theorem, the probability of interest can be found as follows:
\(P(D | +) = \frac{P(+ | D) \times P(D)}{P(+)} = \frac{0.9 \times 0.003}{0.07} \simeq 3.86\%\)
Without the test, the patient has a probability of 0.3% of having the disease. If the patient undergoes the test and it comes back positive, this probability has increased to 3.86%, which is substantially higher than 0.3% but still very small in any absolute sense. Hence, the real question is whether this increase of probability is worth €100 or not. The answer to this question depends on the utility that the patient attributes to the posterior probability. If the disease is untreatable or if the treatment has severe side-effects (or decreases the patient’s quality of life) it is quite likely that the patient is better off if the money is not spent. How would you decide?
A practical way to formalize this is an expected-loss comparison: compute the expected loss without testing, then compare it to the expected loss with testing (including the €100 test cost and separate losses for false positives/false negatives after observing test outcomes). The preferred decision is the one with the lower expected loss.
11.1.2 Task 2
Considering the Likelihoods of Table 9.2 and a prior probability \(\text{P}(real) = 0.5\), what is
\(\text{P}(real|\text{secret wedding in royal family})\)?
We get different results, depending on the choice of \(\alpha\). Assuming that \(\alpha = 1\), we obtain the following posterior probability scores
\[ \begin{gather*} \text{P}(real | \text{secret wedding in royal family}) \propto \\ \text{P} (real) \times \text{P} (\text{secret wedding}_{\alpha = 1} | real) \times \text{P} (\text{royal family}_{\alpha = 1} | real) = \\ 0.5 \times 0.094 \times 0.014 \simeq 0.000658 \end{gather*} \]
and
\[ \begin{gather*} \text{P}(fake | \text{secret wedding in royal family}) \propto \\ \text{P} (fake) \times \text{P} (\text{secret wedding}_{\alpha = 1} | fake) \times \text{P} (\text{royal family}_{\alpha = 1}| fake) = \\ 0.5 \times 0.040 \times 0.088 \simeq 0.00176 \end{gather*} \]
Hence, the actual probability is
\(\text{P}(real|\text{secret wedding in royal family}) = \frac{0.000658}{0.000658+0.00176} \simeq 0.272\)
The word combination “secret wedding” is more likely and “royal family” is very unlikely in real news articles. The second word combination dominates, and for this reason the overall posterior probability of this news article being real is rather low (0.272).
Note: We ignore the word “in” because there are no likelihoods available for this word, nor would it make sense to compute them.
11.1.3 Task 3
In the previous task, we selected an arbitrary prior probability. What would happen if we use the data-based prior instead?
Assuming (again) that \(\alpha = 1\), and using the data-based prior that was discussed in the example (i.e. \(\text{P}(real) = 0.736\)), we obtain the following posterior probability scores
\[ \begin{gather*} \text{P}(real | \text{secret wedding in royal family}) \propto \\ \text{P} (real) \times \text{P} (\text{secret wedding}_{\alpha = 1} | real) \times \text{P} (\text{royal family}_{\alpha = 1} | real) = \\ 0.736 \times 0.094 \times 0.014 \simeq 0.000968576 \end{gather*} \]
and
\[ \begin{gather*} \text{P}(fake | \text{secret wedding in royal family}) \propto \\ \text{P} (fake) \times \text{P} (\text{secret wedding}_{\alpha = 1} | fake) \times \text{P} (\text{royal family}_{\alpha = 1}| fake) = \\ (1 - 0.736) \times 0.040 \times 0.088 \simeq 0.00092928 \end{gather*} \]
Hence, the actual probability is
\(\text{P}(real|\text{secret wedding in royal family}) = \frac{0.000968576}{0.000968576+0.00092928} \simeq 0.51\)
It can be concluded that the choice of the prior probability can have an important impact on the prediction that is made. Empirically speaking, the prevalence of real news is still quite large. Therefore we change our mind and conclude that it is slightly more probable that the news article is real.
11.1.4 Task 4
Suppose that the Naive Bayes model from the previous task, has a sensitivity of 93% (for detecting real news) and a specificity of 87%. What is the probability that the news article from the previous task (or any other prediction that would be made) which is predicted as real, is actually real?
We use the simple formulation of Bayes’ Theorem (i.e. Equation 7.3) which states that
\[ \begin{equation} \text{P}(real | prediction) = \frac{\text{P}(prediction | real) \text{P}(real)}{\text{P}(prediction)} \end{equation} \]
which becomes
\[ \begin{equation} \text{P}(real | +) = \frac{\text{P}(+ | real) \text{P}(real)}{\text{P}(+ | real) \text{P}(real) + \text{P}(+ | fake) \text{P}(fake) } \end{equation} \]
(note: the + represents a prediction that the news article is real)
or
\[ \begin{equation} \label{} \text{P}(real | +) = \frac{0.93 \times 0.736}{0.93 \times 0.736 + (1 - 0.87) \times (1 - 0.736) } \simeq 95.2\% \end{equation} \]
The same result can be obtained through the odds formula (i.e. Equation 7.4)
\[ \begin{equation} \label{} \frac{\text{P}(real | +)}{\text{P}(fake | +)} = \frac{\text{P}(+ | real)}{\text{P}(+ | fake)} \frac{\text{P}(real)}{\text{P}(fake)} = \frac{0.93}{(1 - 0.87)} \frac{0.736}{(1 - 0.736)} = \frac{0.68448}{0.03432} \end{equation} \]
which leads to a probability of \(0.68448 / (0.68448 + 0.03432) \simeq 95.2\%\).
11.2 Law of large numbers
11.2.1 Task 5
We consider the problem 13 in chapter 1 of Grinstead and Snell (2006) (based on the original study by Tversky and Kahneman (1974)):
The psychologist Tversky and his colleagues say that about four out of five people will answer (a) to the following question:
A certain town is served by two hospitals. In the larger hospital, about 45 babies are born each day, and in the smaller hospital, 15 babies are born each day. Although the overall proportion of boys is about 50 percent, the actual proportion at either hospital may be more or less than 50 percent on any day.
At the end of a year, which hospital will have a greater number of days on which more than 60 percent of the babies born were boys?
- the large hospital
- the small hospital
- neither -- the number of days will be about the same.
Assume that the probability of a baby boy is .5 (actual estimates make this more like .513). Decide, based on simulation, what the right answer is to the question. Can you suggest why so many people go wrong?
Investigate this spreadsheet and figure out how the solution works.
The spreadsheet simulates 45 births in the large hospital and 15 births in the small hospital for a total of 365 days by computing random numbers from the Uniform Distribution with the function RAND() (this produces a random number between 0 and 1). The formula
=IF(RAND()>=0.5,"Boy","Girl")
is used to decide whether the baby is a boy or a girl (i.e. the Uniform Distribution is converted into a Bernoulli Distribution -- these distributions will be explained at a later stage).
Based on the simulated births it is easy to count the number of boys and the number of girls for each day. If the number of boys is more than 60% in a particular day, a binary “success” variable is set to 1 (this is column AY for the big hospital and column U for the small one).
Now it is easy to compute the total number of “successes” of the Bernoulli trial (given the assumption that the probability of a boy being born is 50%) and divide it by the number of days. This ratio can be interpreted as the conditional probability that we want to estimate (as is described in the Theorems from Jeffreys’ axiom system).
11.2.2 Task 6
Use the Compute tab to compute a solution. Explain the solution in your own words!
The babies calculator can be used to compute a solution without the need of a spreadsheet. Using the default settings, the following results were obtained:
- Probability of more than 60% of male births in Large Hospital = 0.0685
- Probability of more than 60% of male births in Small Hospital = 0.1425
Note that your results will be different because the computation relies on the random number generator to simulate the probabilities. The figures that are produced by the simulation show that the estimated probabilities converge towards a certain value as the # of simulated days increases. The values listed above are simply the last values that were obtained during the simulation because it is believed that these are closest to the true probabilities. On the left side of the Figures there are a lot of fluctuations because the occurrence of a “success” (i.e. a day in which more than 60% of babies born are boys) is relatively rare.
11.2.3 Task 7
How could you increase the accuracy of the numerical solution? Note: later you will formulate a solution based on the Binomial Distribution -- for now it is sufficient to increase the accuracy of the solution by changing one of the simulation parameters (use the Compute tab of the previous task).
It is possible to increase the accuracy of the numerical simulation by increasing the number of simulations (i.e. the number of days). Instead of simulating just one year, we could simulate a much longer period (e.g. ten years).
The following results were obtained with ten years instead of just one:
- Probability of more than 60% of male births in Large Hospital = 0.0663
- Probability of more than 60% of male births in Small Hospital = 0.1403
Note: it is possible to solve this problem with the Binomial Distribution which is explained later.
11.2.4 Task 8
What would happen if we’d change the “percentage of male births per day” to 80%?
If the percentage of male births is 80% (instead of 60%) then the probabilities become much smaller. The Babies Calculator was used to simulate 10 years (with a percentage of 80%) which yields the following results:
- Probability of more than 80% of male births in Large Hospital = 0
- Probability of more than 80% of male births in Small Hospital = 0.002740
The Figure for the large hospital showed a flat line which means that in 10 years there wasn’t any day in which more than 80% of births were boys. Hence, the estimated probability is zero which is (obviously) not the correct answer because we know that it is (theoretically) possible that such a day occurs - even if the hospital is large. The problem is that the true probability is so small that it is difficult to estimate with a simulation procedure. We would need to run many additional simulations to obtain an accurate result (maybe even more than thousands of years).
The Figure for the small hospital showed that the estimated probability is still fluctuating at the end of year 10. Again, this means that we should further increase the number of simulations to obtain a more or less reliable result.
11.2.5 Task 9
Explain in your own words (i.e. without relying on mathematics) the (weak) Law of Large Numbers.
This is an attempt of making an informal description.
The Law of Large Numbers implies that the average result of random, independent events converges to a stable long-term value. This does not imply that the sequence of events converges immediately towards the true long-term value nor does it provide us any guarantees about the speed with which convergence towards the true long-term value is achieved.