• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Probability Distributions
  2. 20  Normal Distribution (Gaussian Distribution)
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 20.1 Probability Density Function
  • 20.2 Distribution Function
  • 20.3 Moment Generating Function
  • 20.4 1st Uncentered Moment
  • 20.5 2nd Uncentered Moment
  • 20.6 3rd Uncentered Moment
  • 20.7 4th Uncentered Moment
  • 20.8 Centered Moments
  • 20.9 2nd Centered Moment
  • 20.10 3rd Centered Moment
  • 20.11 4th Centered Moment
  • 20.12 Expected Value
  • 20.13 Variance
  • 20.14 Median
  • 20.15 Mode
  • 20.16 Coefficient of Skewness
  • 20.17 Coefficient of Kurtosis
  • 20.18 Coefficient of Variation
  • 20.19 Parameter Estimation
  • 20.20 R Module
  • 20.21 Example
  • 20.22 Random Number Generator
  • 20.23 R Module
  • 20.24 Example
  • 20.25 Property 1: Standard Normal as a Special Case
  • 20.26 Property 2: Degeneracy as Variance Shrinks
  • 20.27 Property 3: Symmetry and Inflection Points
  • 20.28 Property 4: Uncorrelated Implies Independent (Joint Normality)
  • 20.29 Property 5: Sum of i.i.d. Normals
  • 20.30 Property 6: Linear Combination of Independent Normals
  • 20.31 Property 7: Linear Combination in the Multivariate Normal Case
  • 20.32 Related Distributions 1: Sum of Squared Normals Gives Scaled Chi-squared
  • 20.33 Related Distributions 2: Sum of Squared Standard Normals Gives Chi-squared
  • 20.34 Related Distributions 3: Lognormal Transformation
  • 20.35 Related Distributions 4: Student t from Normal and Chi-squared
  • 20.36 Related Distributions 5: Chi Distribution from Chi-squared
  • 20.37 Related Distributions 6: Ratio of Two Standard Normals (Cauchy)
  • 20.38 Purpose
  1. Probability Distributions
  2. 20  Normal Distribution (Gaussian Distribution)

20  Normal Distribution (Gaussian Distribution)

The random variate \(X\) defined for the range \(-\infty \leq X \leq +\infty\), is said to have a Normal Distribution (i.e. \(X \sim \text{N}\left( \mu, \sigma^2 \right)\)) with location parameter \(\mu\) and scale parameter \(\sigma\) where \(-\infty \leq \mu \leq +\infty\) and \(\sigma > 0\).

20.1 Probability Density Function

\[ \text{f}(X) = \frac{e^{-\frac{1}{2} \left( \frac{X - \mu}{\sigma} \right)^2} }{\sigma \sqrt{2 \pi}} \]

The figure below shows an example of the Normal Probability Density function with \(location = 5\) and \(scale = 2\).

Code
x <- seq(-5,17,length=1000)
hx <- dnorm(x, mean = 5, sd = 2)
plot(x, hx, type="l", xlab="X", ylab="f(X)", xlim=c(-5,17), main="Normal density", sub = "(location = 5 and scale = 2)")
Figure 20.1: Example of Normal Probability Density Function (location = 5 and scale = 2)

20.2 Distribution Function

\[ \text{F}(x) = \frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^{x} e^{- \frac{(t-\mu)^2}{2 \sigma^2}} \text{d}t \]

The figure below shows an example of the Normal Distribution with \(location = 5\) and \(scale = 2\).

Code
x <- seq(-5,17,length=1000)
hx <- pnorm(x, mean = 5, sd = 2)
plot(x, hx, type="l", xlab="X", ylab="F(X)", xlim=c(-5,17), main="Normal distribution", sub = "(location = 5 and scale = 2)")
Figure 20.2: Example of Normal Distribution (location = 5 and scale = 2)

20.3 Moment Generating Function

\[ M_X(t) = e^{\mu t + \frac{1}{2} \sigma^2 t^2 } \]

20.4 1st Uncentered Moment

\[ \mu_1' = \mu \]

20.5 2nd Uncentered Moment

\[ \mu_2' = \mu^2 + \sigma^2 \]

20.6 3rd Uncentered Moment

\[ \mu_3' = \mu \left( \mu^2 + 3 \sigma^2 \right) \]

20.7 4th Uncentered Moment

\[ \mu_4' = \mu^4 + 6 \mu^2 \sigma^2 + 3 \sigma^4 \]

20.8 Centered Moments

\[ \mu_j = 0 \text{ for $j$ is odd} \]

\[ \mu_j = \frac{j!}{\left( \frac{j}{2} \right) ! \, 2^{\left( \frac{j}{2} \right)}} \sigma^j \text{ for $j$ even} \]

20.9 2nd Centered Moment

\[ \mu_2 = \sigma^2 \]

20.10 3rd Centered Moment

\[ \mu_3 = 0 \]

20.11 4th Centered Moment

\[ \mu_4 = 3 \sigma^4 \]

20.12 Expected Value

\[ \text{E}(X) = \mu \]

20.13 Variance

\[ \text{V}(X) = \sigma^2 \]

20.14 Median

\[ \text{Med}(X) = \mu \]

20.15 Mode

\[ \text{Mo}(X) = \mu \]

20.16 Coefficient of Skewness

\[ g_1 = 0 \]

20.17 Coefficient of Kurtosis

\[ g_2 = 3 \]

20.18 Coefficient of Variation

\[ VC = \frac{\sigma}{\mu} \]

20.19 Parameter Estimation

\[ \hat{\mu}_{ML} = \bar{x} \text{ (maximum likelihood and unbiased)} \]

\[ \hat{\sigma}^2_{ML} = \frac{1}{n}\sum_{i=1}^{n} \left( x_i - \bar{x} \right)^2 \text{ (biased maximum likelihood)} \]

\[ \hat{\sigma}^2_{\text{unb}} = \frac{1}{n-1}\sum_{i=1}^{n} \left( x_i - \bar{x} \right)^2 = \frac{n}{n-1}\hat{\sigma}^2_{ML} \text{ (unbiased estimator)} \]

where

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

where \(s^2=\hat{\sigma}^2_{\text{unb}}\) denotes the usual sample variance.

20.20 R Module

The best fitting Normal Density function can be obtained by estimating \(\mu\) and \(\sigma\) according to the so-called Maximum Likelihood procedure which can be found on the public website:

  • https://compute.wessa.net/rwasp_fitdistrnorm.wasp

The Maximum Likelihood Fitting for the Normal Distribution is also available in RFC under the menu “Distributions / ML Fitting”.

If you prefer to compute the Maximum Likelihood procedure on your local computer, there are two code snippets that can be pasted into the R console as an example. First we need to install the library “MASS” (you only need to do this once):

install.packages("MASS")

Once the library has been installed, we can use the following code to perform Maximum Likelihood fitting:

library(MASS)
x <- runif(n = 1000) # draw Uniform random numbers
r <- fitdistr(x,'normal')
print(r)
      mean           sd     
  0.517353337   0.285413521 
 (0.009025568) (0.006382040)
Code
par1 = 8
par2 = 'Sturges'
ylab = 'density'
xlab = 'value of data series'
main = 'Histogram and Fitted Normal Density'
myhist<-hist(x, col=par1, breaks=par2, main=main, ylab=ylab, xlab=xlab, freq=F)
curve(1/(r$estimate[2]*sqrt(2*pi))*exp(-1/2*((x-r$estimate[1])/r$estimate[2])**2), min(x), max(x), add=T)
Figure 20.3: Example of ML Fitting for the Normal Distribution

The par2 parameter defines the suggested number of bins of the Histogram (see Chapter 62). Mostly, however, we use the value ‘Sturges’ (default) to have the function compute the number of bins according to the Sturges algorithm (Sturges 1926).

The main functions in this example are hist (displays a histogram), curve (draws the curve of the Normal Density), and fitdistr (performs the actual Maximum Likelihood fitting).

20.21 Example

We analyze the time series of monthly divorces (in thousands) and wish to find out whether it can be adequately described by the Normal Distribution. The Normal Density function seems to be a good fit for the Histogram of monthly divorces:

Interactive Shiny app (click to load).
Open in new tab

The visual fit suggests that a Normal model may be reasonable for the monthly divorces, with an approximate mean of 2.55 and standard deviation of 0.367 (the series is expressed in thousands of divorces). A formal goodness-of-fit assessment is still recommended; see Section 2, Section 124.1, and Chapter 125.

20.22 Random Number Generator

If U\((0,1)\) denotes a random variate with a Uniform Distribution (such as the pseudo random numbers generated by a digital computer) then the following approximation can be made arbitrarily accurate with increasing \(k\):

\[ \text{N}(0,1) \sim \frac{\sum_{i=1}^{k} \text{U}_i(0,1) - \frac{k}{2}}{\sqrt{\frac{k}{12}}} \]

For instance, if \(k=12\) then we can generate standard normally distributed random numbers

\[ \text{N}(0,1) \sim \sum_{i=1}^{12} \text{U}_i(0,1) - 6 \]

This is a classical pedagogical approximation. In practice, modern software (including rnorm) uses more accurate and efficient algorithms.

Based on random numbers for the Standard Normal Distribution, it is possible to obtain random numbers for any \(\mu \in \mathbb{R}\) and \(\sigma \in \mathbb{R}_0^+\) by using the following relationship

\[ \text{N}(\mu, \sigma^2) \sim \sigma \text{N}(0,1) + \mu \]

20.23 R Module

The Random Number Generator for the Normal Density function can be found on the public website:

  • https://compute.wessa.net/rwasp_rngnorm.wasp

The Random Number Generator for the Normal Distribution is also available in RFC under the menu “Distributions / Random Numbers - Normal” (this only applies when using the “default” profile).

If you prefer to generate Normal Random Numbers on your local computer, the following code snippet can be used in the R console1:

library(MASS)
library(msm)

par1 = 100
par2 = 0
par3 = 1
par4 = 2
par5 = 'N'
par6 = 'Sturges'
par7 = -Inf
par8 = Inf

ylab = 'density'
xlab = 'value of generated random numbers'
main = 'Histogram of Generated Random Numbers'
x <- rtnorm(par1, par2, par3, par7, par8)
if ((par7 == -Inf) & (par8 == Inf)) {r <- fitdistr(x,'normal')}

if (par5 == 'Y') {
  print(x)
}

print(r)
      mean           sd     
  -0.09163643    1.01272177 
 ( 0.10127218) ( 0.07161024)

The figure below shows that the Normal Density function fits the simulated data well:

Code
myhist<-hist(x, col=par4, breaks=par6, main=main, ylab=ylab, xlab=xlab, freq=F)
curve(1/(r$estimate[2]*sqrt(2*pi))*exp(-1/2*((x-r$estimate[1])/r$estimate[2])^2), min(x), max(x), add=T)
Figure 20.4: Example of ML Fitting for the Random Number Generator – Normal Distribution

The script produces a Histogram and the associated Normal Density function (through ML Fitting) of the randomly generated numbers. The meaning and interpretation of histograms is discussed in detail in Chapter 62.

Note that it is also possible to use the rnorm function to simulate Normal Random Numbers without the need to use an external library:

rnorm(100, 5, 3)
  [1]  1.80146963  5.72827816  6.64020840  8.06626891  3.90505032  4.66242374
  [7] 10.68673293  6.87140362  7.21619634 -0.45364789  3.08694423  3.04299449
 [13]  9.28591171  1.46463016  6.43009863  3.84759329  7.51820999  2.98670249
 [19]  7.99945949  9.31963394  0.76697681  6.42897400  2.96185209  3.86503586
 [25]  3.96417560  6.87898579  9.40502125  6.82999246  3.24913022  7.97651724
 [31] 12.12138487 -0.57738367  2.88003792  3.35775581  1.68154266  4.65089070
 [37]  5.67606925  7.20485927  5.35622295  5.59702326  9.51918296  3.79305549
 [43] -1.59563704 -1.03299289  3.09224787  5.03424843  6.60488757  5.44012700
 [49]  5.78650990  7.06111925  4.02781026  8.12799522  8.53180293  4.73445934
 [55]  8.53278646  0.14166071  7.12359374 -0.74107671  0.05044067  5.89235414
 [61]  4.46457704  4.64367813  4.26147831  6.10202576  5.51251864  2.01261172
 [67]  0.91416541  3.85836795  4.31122417  1.99095529  0.98057469  6.02385594
 [73] -1.66843826  5.72516051 10.36533374  8.75029876  3.38458474  8.53401889
 [79]  3.76968161  1.83985212  7.88771530 12.30605187  2.18277025 -0.25162244
 [85]  2.73415245  3.85685737  5.64841719 10.83260860  3.91195816  3.76252340
 [91]  5.94851330  6.92957409  3.15594003  1.37834661  7.04974441  1.29819657
 [97]  4.20479365  3.27524250  9.29777534 -1.48975071

Sometimes we prefer to use the rtnorm function (from the msm package) because this provides the option to use a truncation interval (i.e. a lower and upper bound for the random numbers that are generated). If par7 = -Inf (minus infinity) and par8 = Inf (infinity) then the rtnorm function produces similar results as the rnorm function (when the number of simulated values is sufficiently large).

20.24 Example

We generate a series of 100 random numbers for the Normal Distribution with \(\mu = 5\) and \(\sigma = 2\). The Figure from the R Module shows the Histogram of the random numbers and the best fitting Normal Density curve.

Interactive Shiny app (click to load).
Open in new tab

The estimated parameters are obtained through ML Fitting and show the best fitting mean and standard deviation. It can be concluded that the empirical mean and standard deviation are close to the true values that are specified with the sliders. When the number of simulations increases, the empirical and true values will be closer together.

20.25 Property 1: Standard Normal as a Special Case

The Standard Normal Distribution is a Normal Distribution with zero mean (\(\mu = 0\)) and unit variance (\(\sigma^2 = \sigma = 1\)).

20.26 Property 2: Degeneracy as Variance Shrinks

As \(\sigma \rightarrow 0\) then Normal Distribution becomes degenerate at \(\mu\).

20.27 Property 3: Symmetry and Inflection Points

The normally distributed variate \(X \sim \text{N}(\mu, \sigma^2)\) is symmetrical about \(\mu\) with points of inflection at \(X = \mu \pm \sigma\).

20.28 Property 4: Uncorrelated Implies Independent (Joint Normality)

If two variates are jointly normally distributed and uncorrelated they are also independent. This is not necessarily true for other distributions.

20.29 Property 5: Sum of i.i.d. Normals

The sum of \(k\) independent variates which have a Normal Distribution N\(\left( \mu, \sigma^2 \right)\), is also Normally Distributed with mean \(k \mu\) and variance \(k \sigma^2\).

20.30 Property 6: Linear Combination of Independent Normals

Let \(X_i\) (for \(i = 1, 2, …, k\)) be \(k\) normal variates with mean \(\mu_i\) and variance \(\sigma_i^2\) then \(\sum_{i=1}^{k} c_i X_i\) is also normally distributed with mean \(\mu = \sum_{i=1}^{k} c_i \mu_i\) and variance \(\sigma^2 = \sum_{i=1}^{k} c_i^2 \sigma_i^2\) where \(c_i\) (for \(i=1, 2, …, k\)) are weighting factors.

20.31 Property 7: Linear Combination in the Multivariate Normal Case

Suppose that the joint distribution of \(X_i\) (for \(i= 1, 2, …, k\)) is multivariate normal and let \(\mu_i = \text{E} \left( X_i \right)\) and \(c_{ij} = \text{Cov} \left(X_i, X_j\right)\) then it follows that for any \(a \in \mathbb{R}\) and \(b_i \in \mathbb{R}\) (for \(i=1, 2, …, k\)) the random variate \(Z = a + \sum_{i=1}^{k} b_i X_i\) has a Normal Distribution with expected value \(\mu = a + \sum_{i=1}^{k} b_i \mu_i\) and variance \(\sigma^2 = \sum_{i=1}^{k} \sum_{j=1}^{k} b_i b_j c_{ij}\).

This property also holds whether the variates \(X_i\) are independent or not. However, in case of independence, the variance becomes \(\sigma^2 = \sum_{i=1}^{k} b_i^2 V(X_i)\).

20.32 Related Distributions 1: Sum of Squared Normals Gives Scaled Chi-squared

The Chi-squared variate with parameters \(n\) and \(\sigma\) is equal to the sum of the squares of \(n\) normal variates with parameters \(\mu = 0\) and variance \(\sigma^2\), i.e.

\[ \sum_{i=1}^{n} X_i^2 \sim \chi^2 \left( n, \sigma \right) \]

where \(X_i \sim \text{N} \left( 0, \sigma^2 \right)\) for \(i= 1, 2, …, n\).

20.33 Related Distributions 2: Sum of Squared Standard Normals Gives Chi-squared

The Chi-squared variate with \(n\) degrees of freedom is equal to the sum of the squares of \(n\) standard normal variates, i.e.

\[ \sum_{i=1}^{n} X_i^2 \sim \chi^2 (n) \]

where \(X_i \sim \text{N} (0,1)\) for \(i=1,2,…,n\).

20.34 Related Distributions 3: Lognormal Transformation

If the variate \(X\) has a Normal Distribution with parameters \(\mu\) and \(\sigma^2\), i.e. \(X \sim \text{N} \left( \mu, \sigma^2 \right)\) then the variate \(Y\), defined as \(Y = e^X\), has the Lognormal Distribution (or Galton Distribution) with parameters \(\mu\) and \(\sigma^2\), i.e.

\[ Y \sim \text{ln N} \left( \mu, \sigma^2 \right) \]

20.35 Related Distributions 4: Student t from Normal and Chi-squared

If the variate \(X\) has a Standard Normal Distribution, \(Y\) has a Chi-squared Distribution with \(n\) degrees of freedom, and \(X\) and \(Y\) are independent then the variate \(Z = \frac{X}{\sqrt{\frac{Y}{n}}}\) has a Student t-distribution with \(n\) degrees of freedom, i.e. \(Z \sim t(n)\).

20.36 Related Distributions 5: Chi Distribution from Chi-squared

If the random variate \(Y\) has a Chi-squared Distribution, i.e. \(Y \sim \chi^2 (n, \sigma)\), then the variate \(X = \sqrt{\frac{Y}{n}}\) has a Chi Distribution: \(X \sim \chi (n, \sigma)\).

The Chi Distribution is also known as the distribution of the square root of the Quadratic Mean of independent variates with distribution N\(\left( 0, \sigma^2 \right)\).

20.37 Related Distributions 6: Ratio of Two Standard Normals (Cauchy)

Let \(X \sim \text{N}(0,1)\) and \(Y \sim \text{N}(0,1)\), \(X\) and \(Y\) being independent, then the variate \(Z\), defined as \(Z = \frac{X}{Y}\), will follow a Cauchy Distribution with parameters zero and one, i.e. \(Z \sim Cau2(0,1)\).

20.38 Purpose

There are several random processes which can be assumed to have a Normal Distribution. This is of particular importance within the context of Hypothesis Testing which is extensively discussed in Hypothesis Testing. The Normal Distribution can also be used to extend the Multinomial Naive Bayes model from Chapter 9 so that continuous features can be used in addition to binary and count-based features.

Sturges, Herbert A. 1926. “The Choice of a Class Interval.” Journal of the American Statistical Association 21 (153): 65–66. https://doi.org/10.1080/01621459.1926.10502161.

  1. Make sure that the MASS and msm packages have been installed with the install.packages function before running the script.↩︎

19  Uniform Distribution (Rectangular Distribution)
21  Gaussian Naive Bayes Classifier

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences