101 What if \(\sigma\) is unknown?

Testing one-sided and two-sided hypotheses about the mean is straightforward if the population standard deviation \(\sigma\) is known. In practice, however, this is rarely the case which implies that we have to rely on the sample standard deviation \(s\) as an approximation for \(\sigma\).

As explained before, we use the \(t\)-distribution for small samples and the (Gaussian) normal distribution for large samples. Of course, one may also opt to use the \(t\)-distribution by default because it converges to normality as \(N\) approaches \(+\infty\).

101.1 Case 1

The duration of phone calls in a particular region is investigated. The population duration has an asymmetric distribution. We wish to test the Null Hypothesis H\(_0: \mu = \mu_0 = 7\) minutes versus the Alternative Hypothesis H\(_A: \mu \neq \mu_0 = 7\) minutes, given that \(\alpha = 0.05\).

For computation, we convert everything to seconds: \(\mu_0 = 7 \times 60 = 420\) seconds.

A simple random sample (\(N = 1000\)) was drawn and the sample statistics were computed (\(m = 475.2\) seconds and \(s = 151\) seconds).

Since \(N\) is large, we can describe \(\bar{X}\) as follows

\[Y = \frac{1}{\frac{151}{\sqrt{1000}} \sqrt{2 \pi}} e^{-\frac{1}{2} \left( \frac{\bar{X} - 420}{\frac{151}{\sqrt{1000}}} \right)^2}\]

The region of acceptance can be obtained by finding \(k\) in P\((420 - k \leq \bar{X} \leq 420 + k) = 0.95\):

\[ \begin{aligned}\text{P} \left( \frac{420 - k - 420}{\frac{151}{\sqrt{1000}}} \leq \frac{\bar{X} - 420}{\frac{151}{\sqrt{1000}}} \leq \frac{420 + k -420}{\frac{151}{\sqrt{1000}}} \right) &= \text{P} \left( \frac{-k}{4.775} \leq Z \leq \frac{k}{4.775} \right)\\ &= 0.95\end{aligned} \]

The Gaussian Table (Appendix E) allows us to find \(k\)

\[\frac{k}{4.775} = 1.96 \Rightarrow k = 1.96 \times 4.775 \simeq 9.36\]

Using the rough mental approximation \(1.96 \simeq 2\) gives \(k \simeq 9.55\), so the approximate region of acceptance is (410, 430). This allows us to conclude that \(m = 475.2 \notin (410, 430)\) (hence H\(_0\) is rejected). Note: the conclusion does not change using the \(t\)-distribution (Appendix F).

101.2 Case 2

Consider the data in Table 101.1 which displays the number of sleeping hours gained from an experimental intervention.

Table 101.1: Experimental Sleep Intervention

student	1	2	3	4	5	6	7	8	9	10
extra sleep	0.7	-1.1	-0.2	1.2	0.1	3.4	3.7	0.8	1.8	2.0

We assume that the students who received the treatment have been independently and randomly chosen from the population.

Because \(N = 10\) is small, we additionally assume that the population distribution of the extra-sleep variable is approximately normal.

We assume the intervention can only increase sleeping hours, hence we employ a one-sided test:

\[ \begin{align*} &\text{H}_0: \mu = \mu_0 = 0 \\ &\text{H}_A: \mu > \mu_0 = 0 \end{align*} \]

The sample statistics have been computed: \(m = 1.24\) and \(s = 1.45\).

Since \(N = 10\) is small, the statistic \(Z = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{N}}}\) does not follow a standard normal distribution. Under the normality assumption with unknown \(\sigma\), it follows a Student-\(t\) distribution instead: \(t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{N}}}\)

The region of acceptance can be computed by finding \(k\) in P\((\bar{X} \leq \mu + k) = 0.95\):

\[\text{P} \left( \frac{\bar{X} - \mu}{\frac{s}{\sqrt{N}}} \leq \frac{\mu + k -\mu}{\frac{s}{\sqrt{N}}} \right) = 0.95\]

\[\text{P} \left( t \leq \frac{k}{\frac{s}{\sqrt{N}}} \right) = 0.95\]

Using the \(t\)-distribution (Appendix F) we find \(t \simeq 1.83\) which implies

\[k \simeq 1.83 \frac{s}{\sqrt{N}} = \frac{1.83 * 1.45}{\sqrt{10}} \simeq 0.84\]

Hence the region of acceptance is \(\bar{X} \leq 0 + 0.84 = 0.84\). Since \(m = 1.24 > 0.84\) we have to reject H\(_0\). The conclusion is that the intervention increases sleeping hours.