105 Statistical Test of the Variance

105.1 Theory

105.1.1 Statistical Hypothesis: Testing the Variance - Population

The population distribution of the random variable \(X\) is written as \(X \sim \text{N} \left( \mu, \sigma^2 \right)\) where \(\mu\) and \(\sigma^2\) represent the mean and variance of the normal distribution. In this representation it is assumed that \(\sigma^2\) is unknown. The parameter \(\mu\) can be either known or unknown.

105.1.2 Statistical Hypothesis: Testing the Variance - Sample

The statistic for the sample mean is \(\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i\) where \(n\) is the number of observations in the sample. The sample statistic for the variance can be written in terms of \(\mu\) (if this population parameter is known) or in terms of the sample mean \(\bar{x}\):

\[ \begin{cases}\frac{\sum_{i=1}^{n}\left( x_i - \mu \right)^2}{\sigma^2} = \sum_{i=1}^{n} \left( \frac{x_i - \mu}{\sigma} \right)^2 = \frac{ns^2}{\sigma^2} \\\frac{\sum_{i=1}^{n}\left( x_i - \bar{x} \right)^2}{\sigma^2} = \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{\sigma} \right)^2 = \frac{(n-1)s^2}{\sigma^2}\end{cases} \]

The distribution of the sample variance can be written in terms of \(\mu\) (if this population parameter is known) or in terms of the sample mean \(\bar{x}\):

\[ \begin{align*}\frac{\sum_{i=1}^{n} \left( x_i - \mu \right)^2 }{\sigma^2} \sim \chi_n^2 \\\frac{\sum_{i=1}^{n} \left( x_i - \bar{x} \right)^2 }{\sigma^2} \sim \chi_{n-1}^2\end{align*} \]

105.1.3 Statistical Hypothesis: Testing the Variance - Critical Region

Table 105.1: Hypotheses Overview

Null Hypothesis	Alternative Hypothesis	Critical Region
\(\sigma^2 \leq \sigma_0^2\)	\(\sigma^2 > \sigma_0^2\)	\(\chi^2 \geq \chi_{\alpha,df}^2\)
\(\sigma^2 \geq \sigma_0^2\)	\(\sigma^2 < \sigma_0^2\)	\(\chi^2 \leq \chi_{1-\alpha,df}^2\)
\(\sigma^2 = \sigma_0^2\)	\(\sigma^2 \neq \sigma_0^2\)	\(\begin{cases} \chi^2 \geq \chi_{\frac{\alpha}{2},df}^2 & \text{(upper tail)} \\ \chi^2 \leq \chi_{1-\frac{\alpha}{2},df}^2 & \text{(lower tail)} \end{cases}\)

105.1.4 The Chi-squared distribution

105.1.4.1 Definition

Let \(X\) be a stochastic variable following a normal distribution with expected value \(\mu\) and variance \(\sigma^2\):

\[ X \sim \text{N} \left( \mu, \sigma^2 \right) \]

From this it follows that

\[ u = \frac{X - \mu}{\sigma} \sim \text{N} (0, 1) \]

The \(\chi^2\)-distribution with one degree of freedom is defined as the square of a standard normal distributed variate, i.e.

\[ u^2 = \left( \frac{X - \mu}{\sigma} \right)^2 \sim \chi_1^2 \]

The \(\chi^2\)-distribution with \(n\) degrees of freedom is defined as the sum of \(n\) squared independent standard normal distributed variates:

\[ \sum_{i=1}^{n} u_i^2 = \sum_{i=1}^{n} \left( \frac{X_i - \mu}{\sigma} \right)^2 \sim \chi_n^2 \]

105.1.4.2 Property 1

The sum of two independent \(\chi^2\)-distributed variates, with degrees of freedom \(n_1\) and \(n_2\) respectively, is also \(\chi^2\)-distributed with degrees of freedom equal to \(n_1 + n_2\). In general, the difference of two independent \(\chi^2\) variates is not \(\chi^2\)-distributed.

105.1.4.3 Property 2

The expected value of a \(\chi^2\)-distributed variate is equal to the number of degrees of freedom:

\[ \text{E} \left( \chi_n^2 \right) = n \]

The variance of a \(\chi^2\)-distributed variate is equal to two times the number of degrees of freedom:

\[ \text{V} \left( \chi_n^2 \right) = 2n \]

105.1.5 Approximation of the Chi-squared distribution

105.1.5.1 Rule of thumb

For large samples, the distribution of

\[ \sqrt{2 \chi_n^2} - \sqrt{2 n - 1} \]

can be approximated by the standard normal distribution N\((0,1)\).

105.1.5.2 Example

Let \(n = 30\) and find the value \(c\) for which P\(\left( \chi_n^2 \geq c \right) = 0.05\).

Using the given approximation we obtain

\[ \begin{aligned}\text{P} \left( \sqrt{2 \chi_n^2} - \sqrt{2 n - 1} \geq \sqrt{2c} - \sqrt{2n - 1} \right) &= 0.05 \\\text{P} (u \geq k) &= 0.05\end{aligned} \]

Since \(k = 1.645\) it follows that the approximation results in \(c = 43.49\). According to the \(\chi^2\)-table, the correct value for the critical value is 43.773 (Appendix G). The approximation converges towards the correct value as \(n \rightarrow +\infty\).

105.1.6 Distribution of Sample Variance

105.1.6.1 Proof

\[ \begin{align*}\frac{ns^2}{\sigma^2} = \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{\sigma} \right)^2 &= \sum_{i=1}^{n} \left[ \frac{\left( x_i - \mu \right) - \left( \bar{x} - \mu \right) }{\sigma} \right]^2 \\&= \sum_{i=1}^{n} \left[ \frac{(x_i - \mu)^2}{\sigma^2} + \frac{(\bar{x} - \mu)^2}{\sigma^2} - 2 \frac{(x_i - \mu) (\bar{x}-\mu)}{\sigma^2} \right] \\&= \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{\sigma^2} + n \frac{(\bar{x}-\mu)^2}{\sigma^2} - 2 \frac{(\bar{x}-\mu)}{\sigma} \sum_{i=1}^{n} \frac{(x_i - \mu)}{\sigma} \\&= \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{\sigma^2} + n \frac{(\bar{x}-\mu)^2}{\sigma^2} - 2 \frac{(\bar{x}-\mu)}{\sigma} \frac{\sum_{i=1}^{n} x_i - n \mu}{\sigma} \\&= \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{\sigma^2} + n \frac{(\bar{x}-\mu)^2}{\sigma^2} - 2 \frac{(\bar{x}-\mu)}{\sigma} \frac{n \bar{x} - n \mu}{\sigma} \\&= \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{\sigma^2} + n \frac{(\bar{x}-\mu)^2}{\sigma^2} - 2 \frac{(\bar{x}-\mu)}{\sigma} \frac{n (\bar{x} - \mu)}{\sigma} \\&= \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{\sigma^2} - n \frac{(\bar{x}-\mu)^2}{\sigma^2} \\&= \sum_{i=1}^{n} \frac{(x_i - \mu)^2}{\sigma^2} - \left( \frac{(\bar{x}-\mu)}{\frac{\sigma}{\sqrt{n}}} \right)^2 \\\frac{ns^2}{\sigma^2} = \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{\sigma} \right)^2 &= \sum_{i=1}^{n} \left( \frac{x_i - \mu}{\sigma} \right)^2 - \left( \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}} \right)^2\end{align*} \]

From a random sample, with sample size \(n\) and drawn from a population following a normal distribution and given mean and standard deviation, the sample variance can be estimated as described in the following cases.

105.1.6.2 Estimation -- Case 1: mean is unknown

\[ s^2 = \frac{1}{n} \sum_{i=1}^{n}\left( x_i - \bar{x} \right)^2 \]

which implies that

\[ \frac{ns^2}{\sigma^2} = \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{\sigma} \right)^2 \]

This result can be easily rewritten as follows

\[ \begin{align*}\frac{ns^2}{\sigma^2} &= \sum_{i=1}^{n} \left( \frac{x_i - \mu}{\sigma} \right)^2 - n \left( \frac{\bar{x}-\mu}{\sigma} \right)^2 \\&= \sum_{i=1}^{n} \left( \frac{x_i - \mu}{\sigma} \right)^2 - \left( \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}} \right)^2 \\&= V - W\end{align*} \]

The right hand side consists of two parts:

\(V\): the sum of \(n\) squared independent standard normally distributed variates, i.e. \(V \sim \chi_n^2\)
\(W\): the squared standard normally distributed variate, i.e. \(W \sim \chi_1^2\)

It can be concluded that

\[ \frac{ns^2}{\sigma^2} = (V-W) \sim \left( \chi_n^2 - \chi_1^2 \right) \sim \chi_{n-1}^2 \]

105.1.6.3 Estimation -- Case 2: mean is known

An interesting consequence of the previous case is that the statistic \(\frac{ns^2}{\sigma^2} = \frac{n \frac{1}{n}\sum_{i=1}^{n}\left( x_i - \mu \right)^2}{\sigma^2}\) is also \(\chi^2\)-distributed but with \(n\) degrees of freedom instead of \(n-1\). The loss of one degree of freedom in the first case is due to the substitution of the unknown population parameter \(\mu\) by the sample mean \(\bar{x}\).

105.1.7 Summary

Table 105.2: Estimation of Variance -- Test Statistics & Distributions

Population \(\mu\)	Estimation of \(\sigma^2\)	Test Statistic & Distribution
\(\mu\) known	\(\begin{cases} s^2 = \frac{1}{n} \sum_{i=1}^{n} \left( x_i - \mu \right)^2\\ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} \left( x_i - \mu \right)^2 \end{cases}\)	\(\begin{cases} \frac{n s^2}{\sigma^2} \sim \chi_n^2 \\ \frac{(n-1)s^2}{\sigma^2} \sim \chi_n^2 \end{cases}\)
\(\mu\) unknown	\(\begin{cases} s^2 = \frac{1}{n} \sum_{i=1}^{n} \left( x_i - \bar{x} \right)^2 \\ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} \left( x_i - \bar{x} \right)^2 \end{cases}\)	\(\begin{cases} \frac{n s^2}{\sigma^2} \sim \chi_{n-1}^2 \\ \frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2 \end{cases}\)

105.2 Examples

105.2.1 Statistical Hypothesis: Testing Variance -- Example 1: Critical Value (Region)

105.2.1.1 Problem

Find the value \(c\) if the following information is available:

Population Variance \(\sigma^2\): unknown
Population Mean \(\mu\): unknown
Sample Size \(n = 8\)
Sample Variance \(s^2 = 0.64\)
Sample Standard Deviation \(s = 0.80\)
Null Hypothesis for \(\sigma^2\): \(\sigma_0^2 = 0.36\)
Alternative Hypothesis H\(_A\): \(\sigma^2 > \sigma_0^2\) (right-sided test)
Type I Error \(\alpha = 0.05\)

105.2.1.2 Solution

\[ \begin{align*}\text{P}\left( s^2 \geq c \right) &= 0.05 \\\text{P} \left( \frac{(n-1)s^2}{\sigma^2} \geq \frac{(n-1)c}{\sigma^2} \right) &= 0.05\end{align*} \]

From Appendix G it follows that

\[ \begin{align*}\frac{(n-1)c}{\sigma^2} &= 14.0671 \\c &= \frac{\sigma^2}{n-1} 14.0671 = \frac{0.36}{7} 14.0671 \\c &= 0.72345\end{align*} \]

Hence P\(\left( s^2 \geq 0.72345 \right) = 0.05\).

105.2.1.3 Conclusion

Since \(s^2 = 0.64\) is smaller than \(c = 0.72345\) there is no reason to reject the Null Hypothesis.

105.2.1.4 Software

The R module is available on the public website:

https://compute.wessa.net/rwasp_hypothesisvariance1.wasp

To compute this Hypothesis Test on your local machine, the following script can be used in the R console:

par1 = 8 #Sample size 
par2 = 0.64 #Sample Variance 
par3 = 0.36 #Null hypothesis 
par4 = 0.05 #Type I error (alpha) 
df <- par1 - 1
if (par2 > par3) {
  myc <- par3 / df * qchisq(1-par4,df)
} else {
  myc <- par3 / df * qchisq(par4,df)
}
print(myc)

[1] 0.7234529

105.2.2 Statistical Hypothesis: Testing Variance -- Example 2: p-value (probability)

105.2.2.1 Problem

Find the p-value if the following information is available:

Population Variance \(\sigma^2\): unknown
Population Mean \(\mu\): unknown
Sample Size \(n = 8\)
Sample Variance \(s^2 = 0.64\)
Sample Standard Deviation \(s = 0.80\)
Null Hypothesis for \(\sigma^2\): \(\sigma_0^2 = 0.36\)
Type I Error \(\alpha = 0.05\)

105.2.2.2 Solution

\[ \frac{(n-1)s^2}{\sigma^2} = \frac{(8-1) 0.64}{0.36} = \frac{7 \times 0.64}{0.36} \simeq 12.44 \]

Hence P\(\left( \chi_7^2 \geq 12.44 \right) = 0.0869\). Note: the exact p-value cannot be determined based on Appendix G (it is only possible to use an approximate interpolation). With the use of statistical software, however, it is possible to obtain the exact p-value.

105.2.2.3 Conclusion

Since the probability \(0.0869\) is larger than \(\alpha = 0.05\) there is no reason to reject the Null Hypothesis.

105.2.2.4 Software

The R module is available on the public website:

https://compute.wessa.net/rwasp_hypothesisvariance2.wasp

To compute this Hypothesis Test on your local machine, the following script can be used in the R console:

par1 = 8 #Sample size 
par2 = 0.64 #Sample Variance 
par3 = 0.36 #Null hypothesis 
par4 = 0.05 #Type I error (alpha) 
df <- par1 - 1
myc <- df * par2 / par3
print(myc)
if (par2 > par3)
{
  myp <- 1 - pchisq(myc,df)
} else {
  myp <- pchisq(myc,df)
}
print(myp)

[1] 12.44444
[1] 0.08685819

105.2.3 Statistical Hypothesis: Testing Variance -- Example 3: Acceptance Regions for Sample Variance (under H\(_0\))

105.2.3.1 Problem

Find the acceptance region (under H\(_0\)) for the Sample Variance if the following information is available:

Population Variance \(\sigma^2\): unknown
Population Mean \(\mu\): unknown
Sample Size \(n = 8\)
Null Hypothesis for \(\sigma^2\): \(\sigma_0^2 = 0.36\)
Probability level under H\(_0\) for \(s^2\): \(0.90\)

105.2.3.2 Solution

\[ \begin{align*}&\text{P}\left( a \leq s^2 \leq b \right) &= 0.90 \\&\text{P}\left( \frac{(n-1)a}{\sigma^2} \leq \frac{(n-1)s^2}{\sigma^2} \leq \frac{(n-1)b}{\sigma^2} \right) &= 0.90\end{align*} \]

Based on Appendix G and Appendix H it is possible to determine the upper and lower tail critical values

\[ \begin{align*} \frac{(n-1)b}{\sigma^2} &= 14.067 \\ b &= \frac{\sigma^2}{(n-1)} 14.067 = \frac{0.36}{7} 14.067 \simeq 0.72345 \\ \frac{(n-1)a}{\sigma^2} &= 2.16735 \\ a &= \frac{\sigma^2}{(n-1)} 2.16735 = \frac{0.36}{7} 2.16735 \simeq 0.11146 \\ \end{align*} \]

Hence

\[ \text{P}\left( 2.167 \leq \chi_7^2 \leq 14.067 \right) = 0.90 \]

105.2.3.3 Conclusion

The two-sided 90% acceptance region for the Sample Variance \(s^2\) (under H\(_0\)) is \([0.11146, 0.72345]\). In other words, P\(\left( 0.11146 \leq s^2 \leq 0.72345 \right) = 0.90\).

105.2.3.4 Software

The R module is available on the public website:

https://compute.wessa.net/rwasp_hypothesisvariance3.wasp

To compute this Hypothesis Test on your local machine, the following script can be used in the R console:

par1 = 8 #Sample size 
par2 = 0.36 #Null hypothesis 
par3 = 0.90 #Confidence Interval
df <- par1 - 1
halfalpha <- (1 - par3) / 2
ua <- qchisq(halfalpha,df) * par2 / df
ub <- qchisq(1-halfalpha,df) * par2 / df
#Two-sided confidence interval
print(ua)
print(ub)
#Left one-sided confidence interval [ul, +inf]
ul <- qchisq(1-par3,df) * par2 / df
print(ul)
#Right one-sided confidence interval [0, ur]
ur <- qchisq(par3,df) * par2 / df
print(ur)

[1] 0.1114637
[1] 0.7234529
[1] 0.1457026
[1] 0.618019

105.2.4 Statistical Hypothesis: Testing Variance -- Example 4: Confidence Intervals for Population Variance

105.2.4.1 Problem

Find the Confidence Interval for the Population Variance if the following information is available:

Population Variance \(\sigma^2\): unknown
Population Mean \(\mu\): unknown
Sample Size \(n = 8\)
Sample Variance \(s^2 = 0.64\)
Sample Standard Deviation \(s = 0.80\)
Confidence level for \(s^2\): \(0.90\)

105.2.4.2 Solution

\[ \begin{align*}&\text{P}\left( a \leq \sigma^2 \leq b \right) &= 0.90 \\&\text{P}\left( \frac{1}{b} \leq \frac{1}{\sigma^2} \leq \frac{1}{a} \right) &= 0.90 \\&\text{P}\left( \frac{(n-1)s^2}{b} \leq \frac{(n-1)s^2}{\sigma^2} \leq \frac{(n-1)s^2}{a} \right) &= 0.90\end{align*} \]

Based on Appendix G and Appendix H it is possible to determine the upper and lower tail critical values

\[ \begin{align*}\frac{(n-1)s^2}{b} &= 2.16735 \\b &= \frac{(n-1)s^2}{2.16735} = \frac{7 \times 0.64}{2.16735} \simeq 2.06704 \\ \frac{(n-1)s^2}{a} &= 14.067 \\a &= \frac{(n-1)s^2}{14.067} = \frac{7 \times 0.64}{14.067} \simeq 0.31847 \\ \end{align*} \]

105.2.4.3 Conclusion

The two-sided 90% Confidence Interval for the Population Variance \(\sigma^2\) is \([0.31847, 2.06704]\). In other words, P\(\left( 0.31847 \leq \sigma^2 \leq 2.06704 \right) = 0.90\).

105.2.4.4 Software

The R module is available on the public website:

https://compute.wessa.net/rwasp_hypothesisvariance4.wasp

To compute this Hypothesis Test on your local machine, the following script can be used in the R console:

par1 = 8 #Sample size 
par2 = 0.64 #Sample Variance
par3 = 0.90 #Confidence Interval
df <- par1 - 1
halfalpha <- (1 - par3) / 2
ub <- par2 * df / qchisq(halfalpha,df)
ua <- par2 * df / qchisq(1-halfalpha,df)
#Two-sided confidence interval
print(ua)
print(ub)
#Right one-sided confidence interval [0, ur]
ur <- par2 * df / qchisq(1-par3,df)
print(ur)
#Left one-sided confidence interval [ul, +inf]
ul <- par2 * df / qchisq(par3,df)
print(ul)

[1] 0.3184727
[1] 2.06704
[1] 1.581303
[1] 0.3728041