• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Probability Distributions
  2. 43  Sample Correlation Distribution
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 43.1 Probability Density Function
  • 43.2 Purpose
  • 43.3 Distribution Function
  • 43.4 Moment Generating Function
  • 43.5 1st Uncentered Moment
  • 43.6 2nd Uncentered Moment
  • 43.7 3rd Uncentered Moment
  • 43.8 4th Uncentered Moment
  • 43.9 2nd Centered Moment
  • 43.10 3rd Centered Moment
  • 43.11 4th Centered Moment
  • 43.12 Expected Value
  • 43.13 Variance
  • 43.14 Median
  • 43.15 Mode
  • 43.16 Coefficient of Skewness
  • 43.17 Coefficient of Kurtosis
  • 43.18 Key Result: t-Transform
  • 43.19 Parameter Estimation
  • 43.20 R Module
    • 43.20.1 RFC
    • 43.20.2 Direct app link
    • 43.20.3 R Code
  • 43.21 Example
  • 43.22 Random Number Generator
  • 43.23 Property 1: t-Transform
  • 43.24 Property 2: Asymptotic Normality
  • 43.25 Property 3: Beta Representation
  • 43.26 Related Distributions 1: Student’s t-Distribution
  • 43.27 Related Distributions 2: Beta Distribution
  1. Probability Distributions
  2. 43  Sample Correlation Distribution

43  Sample Correlation Distribution

When \(n\) bivariate normal observations are drawn under zero population correlation (\(\rho = 0\)), the sample correlation coefficient \(r\) follows a specific symmetric distribution on \((-1, 1)\). This sampling distribution provides the exact basis for testing whether an observed correlation is statistically distinguishable from zero.

Formally, when \((X_i, Y_i)\) are i.i.d. bivariate Normal with correlation \(\rho = 0\), the sample Pearson correlation \(R\) defined on \((-1, 1)\) follows the distribution \(R \sim \text{CorDist}(n)\) with sample size \(n \geq 3\).

Note: This chapter describes the distribution of the sample correlation under \(H_0: \rho = 0\). For the full hypothesis test of \(H_0: \rho = 0\), including software output and interpretation, see Chapter 71.

WarningScope and Applicability

This distribution applies to Pearson correlation computed from i.i.d. paired observations under a bivariate Normal model and the null hypothesis \(H_0:\rho = 0\). It should not be used for sample autocorrelations of a single time series, nor for correlations between serially dependent time series. In those settings, inference is usually based on large-sample approximations such as white-noise confidence bands, Bartlett-type formulas, or large-lag variance results rather than the exact finite-sample distribution described here.

43.1 Probability Density Function

\[ f(r) = \frac{(1 - r^2)^{(n-4)/2}}{\text{B}\!\left(\tfrac{1}{2},\, \tfrac{n-2}{2}\right)}, \quad -1 < r < 1 \]

The figure below shows examples of the sample correlation distribution for different sample sizes.

Code
dcorr <- function(r, n) {
  ifelse(abs(r) < 1,
         (1 - r^2)^((n - 4)/2) / beta(0.5, (n - 2)/2),
         0)
}

par(mfrow = c(2, 2))
r <- seq(-1, 1, length = 500)

plot(r, dcorr(r, 5), type = "l", lwd = 2, col = "blue",
     xlab = "r", ylab = "f(r)", main = "n = 5")

plot(r, dcorr(r, 10), type = "l", lwd = 2, col = "blue",
     xlab = "r", ylab = "f(r)", main = "n = 10")

plot(r, dcorr(r, 30), type = "l", lwd = 2, col = "blue",
     xlab = "r", ylab = "f(r)", main = "n = 30")

plot(r, dcorr(r, 100), type = "l", lwd = 2, col = "blue",
     xlab = "r", ylab = "f(r)", main = "n = 100")

par(mfrow = c(1, 1))
Figure 43.1: Sampling distribution of Pearson r under H0: rho = 0 for various sample sizes

43.2 Purpose

The sample correlation distribution serves as the null reference distribution for testing independence between two continuous variables in a bivariate normal population. It enables exact inference for any sample size \(n \geq 3\) without requiring large-sample approximations. Common applications include:

  • Testing whether two continuous variables are linearly uncorrelated in a bivariate normal population
  • Determining critical values for the Pearson correlation test statistic
  • Computing exact p-values for observed correlations under \(H_0: \rho = 0\)
  • Understanding the sampling variability of \(r\) when the true correlation is zero
  • Teaching the foundations of correlation hypothesis testing

Relation to the discrete setting. This is an inferential tool, not a data-generating model. Its closest discrete analog is the distribution of a sample proportion under \(H_0: p = 0.5\), which is also symmetric and used for hypothesis testing of association.

43.3 Distribution Function

The CDF is obtained via the regularized incomplete beta function. In R:

# CDF of sample correlation under H0: rho = 0
pcorr <- function(r, n) {
  pbeta((1 + r) / 2, shape1 = (n - 2) / 2, shape2 = (n - 2) / 2)
}

The figure below shows the sample correlation CDF for \(n = 30\).

Code
pcorr <- function(r, n) {
  pbeta((1 + r) / 2, shape1 = (n - 2) / 2, shape2 = (n - 2) / 2)
}

r <- seq(-1, 1, length = 500)
plot(r, pcorr(r, 30), type = "l", lwd = 2, col = "blue",
     xlab = "r", ylab = "F(r)", main = "Sample Correlation CDF (n = 30)",
     sub = expression(H[0] ~ ":" ~ rho == 0))
Figure 43.2: CDF of sample correlation under H0: rho = 0 (n = 30)

43.4 Moment Generating Function

The MGF does not have a simple closed form. Moments are derived from the Beta distribution representation.

43.5 1st Uncentered Moment

\[ \mu_1' = \text{E}(R) = 0 \]

By symmetry of the distribution about zero under \(H_0: \rho = 0\).

43.6 2nd Uncentered Moment

\[ \mu_2' = \text{E}(R^2) = \frac{1}{n-1} \]

43.7 3rd Uncentered Moment

\[ \mu_3' = 0 \]

By symmetry.

43.8 4th Uncentered Moment

\[ \mu_4' = \frac{3}{(n-1)(n+1)} \]

43.9 2nd Centered Moment

\[ \mu_2 = \text{Var}(R) = \frac{1}{n-1} \]

43.10 3rd Centered Moment

\[ \mu_3 = 0 \]

43.11 4th Centered Moment

\[ \mu_4 = \frac{3}{(n-1)(n+1)} \]

43.12 Expected Value

\[ \text{E}(R) = 0 \]

43.13 Variance

\[ \text{Var}(R) = \frac{1}{n-1} \]

As \(n\) increases, the sampling distribution of \(r\) concentrates around zero, reflecting increased precision of the correlation estimate.

43.14 Median

\[ \text{Med}(R) = 0 \]

43.15 Mode

\[ \text{Mo}(R) = 0 \]

Both the median and mode equal zero, reflecting the perfect symmetry of the null distribution.

43.16 Coefficient of Skewness

\[ g_1 = 0 \]

The distribution is symmetric about zero under \(H_0: \rho = 0\).

43.17 Coefficient of Kurtosis

\[ g_2 = \frac{\mu_4}{\mu_2^2} = \frac{3/(n-1)(n+1)}{1/(n-1)^2} = \frac{3(n-1)}{n+1} \]

For large \(n\), \(g_2 \to 3\) (approaching the Normal), consistent with the asymptotic normality of \(r\).

43.18 Key Result: t-Transform

The test statistic

\[ T = \frac{R\sqrt{n-2}}{\sqrt{1-R^2}} \]

follows a \(t\)-distribution with \(n-2\) degrees of freedom under \(H_0: \rho = 0\) (see Chapter 25). This is the most commonly used result for testing correlation.

43.19 Parameter Estimation

The sample correlation coefficient \(r = \sum(x_i-\bar x)(y_i - \bar y) / \sqrt{\sum(x_i-\bar x)^2\sum(y_i-\bar y)^2}\) is computed directly from data. The parameter \(n\) is the sample size.

43.20 R Module

43.20.1 RFC

The sample correlation distribution is covered in the context of the correlation test: see the “Hypothesis Testing / Pearson Correlation Test” module in RFC.

43.20.2 Direct app link

  • https://shiny.wessa.net/rdistribution/

43.20.3 R Code

The following code demonstrates the t-transform test for a correlation coefficient:

r <- 0.42; n <- 30

# t-test statistic for H0: rho = 0
t_stat <- r * sqrt(n - 2) / sqrt(1 - r^2)

# Two-sided p-value
p_val <- 2 * pt(-abs(t_stat), df = n - 2)

cat("r =", r, "\n")
cat("n =", n, "\n")
cat("t =", round(t_stat, 4), "\n")
cat("df =", n - 2, "\n")
cat("p-value (two-sided) =", round(p_val, 4), "\n")
r = 0.42 
n = 30 
t = 2.4489 
df = 28 
p-value (two-sided) = 0.0208 

43.21 Example

With \(n = 30\) observations, we observe a sample correlation of \(r = 0.42\). We test \(H_0: \rho = 0\) against \(H_1: \rho \neq 0\).

r <- 0.42; n <- 30

# t-test statistic
t_stat <- r * sqrt(n - 2) / sqrt(1 - r^2)
p_val  <- 2 * pt(-abs(t_stat), df = n - 2)

cat("t =", round(t_stat, 3), "  p =", round(p_val, 4), "\n")

if (p_val < 0.05) {
  cat("Conclusion: reject H0 at 5% level\n")
} else {
  cat("Conclusion: fail to reject H0 at 5% level\n")
}

# 95% critical value
t_crit <- qt(0.975, df = n - 2)
r_crit <- t_crit / sqrt(t_crit^2 + n - 2)
cat("Critical r (two-sided, alpha = 0.05):", round(r_crit, 4), "\n")
t = 2.449   p = 0.0208 
Conclusion: reject H0 at 5% level
Critical r (two-sided, alpha = 0.05): 0.361 
Interactive Shiny app (click to load).
Open in new tab

43.22 Random Number Generator

Sample correlations under \(H_0: \rho = 0\) can be simulated by generating bivariate normal data with zero correlation and computing the sample correlation:

set.seed(123)
n <- 30; N_sim <- 1000

# Simulate N_sim sample correlations under H0: rho = 0
r_sims <- replicate(N_sim, {
  x <- rnorm(n); y <- rnorm(n)
  cor(x, y)
})

cat("Simulated mean of r:", round(mean(r_sims), 4), "\n")
cat("Theoretical mean:", 0, "\n")
cat("Simulated var of r:", round(var(r_sims), 4), "\n")
cat("Theoretical var:", round(1/(n-1), 4), "\n")
Simulated mean of r: 0.006 
Theoretical mean: 0 
Simulated var of r: 0.0357 
Theoretical var: 0.0345 
Interactive Shiny app (click to load).
Open in new tab

43.23 Property 1: t-Transform

Under \(H_0: \rho = 0\):

\[ T = \frac{R\sqrt{n-2}}{\sqrt{1-R^2}} \sim t(n-2) \]

This is the fundamental result that makes correlation testing tractable with standard t-tables. See Chapter 25.

43.24 Property 2: Asymptotic Normality

As \(n \to \infty\):

\[ \sqrt{n-1}\, R \xrightarrow{d} N(0, 1) \]

The variance \(1/(n-1)\) decreases to zero, so the sample correlation concentrates around the true population correlation.

43.25 Property 3: Beta Representation

Under \(H_0: \rho = 0\), the shifted correlation \((1+R)/2\) follows:

\[ \frac{1+R}{2} \sim \text{Beta}\!\left(\frac{n-2}{2},\, \frac{n-2}{2}\right) \]

This connects the correlation distribution directly to the Beta distribution family. See Chapter 30.

43.26 Related Distributions 1: Student’s t-Distribution

The t-transform \(T = R\sqrt{n-2}/\sqrt{1-R^2}\) follows the Student’s t-distribution with \(n-2\) degrees of freedom (see Chapter 25).

43.27 Related Distributions 2: Beta Distribution

The shifted correlation \((1+R)/2\) follows a Beta distribution with equal shape parameters (see Chapter 30).

42  Beta Prime Distribution
44  Dirichlet Distribution

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences