• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Hypothesis Testing
  2. 125  Kolmogorov-Smirnov Test
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 125.1 Definition
  • 125.2 One-Sample Test
    • 125.2.1 Test Statistic
    • 125.2.2 Hypotheses
    • 125.2.3 Critical Values
  • 125.3 Two-Sample Test
    • 125.3.1 Test Statistic
    • 125.3.2 Hypotheses
    • 125.3.3 Critical Values
  • 125.4 Comparison with Chi-Squared Goodness-of-Fit Test
  • 125.5 R Module
    • 125.5.1 Public website
    • 125.5.2 RFC
    • 125.5.3 R Code
  • 125.6 Visualizing the Test
  • 125.7 Example: Testing for Normality
  • 125.8 Example: Comparing Two Samples
  • 125.9 Lilliefors Test
  • 125.10 Shapiro-Wilk Test
    • 125.10.1 How It Works
    • 125.10.2 Hypotheses
    • 125.10.3 R Code
    • 125.10.4 Worked Example
    • 125.10.5 Limitations
    • 125.10.6 Comparison: When to Use Which Normality Test
  • 125.11 Pros & Cons
    • 125.11.1 Pros
    • 125.11.2 Cons
  • 125.12 Related Tests
  • 125.13 Task
  1. Hypothesis Testing
  2. 125  Kolmogorov-Smirnov Test

125  Kolmogorov-Smirnov Test

125.1 Definition

The Kolmogorov-Smirnov (K-S) test (Kolmogorov 1933; Smirnov 1948) is a nonparametric goodness-of-fit test that compares cumulative distribution functions. It can be used as a one-sample test (comparing data to a theoretical distribution) or a two-sample test (comparing two datasets).

Unlike the Chi-squared goodness-of-fit test (Section 124.1) which requires binning continuous data into categories, the K-S test works directly with the empirical cumulative distribution function (ECDF) and preserves all information in the data.

ImportantDecision Threshold Choice

The K-S test can play different roles, and the threshold should follow the role:

  • Diagnostic role (e.g. shape/assumption screening): a higher alpha (often 10% to 20%) may be reasonable to reduce false reassurance.
  • Confirmatory role (formal goodness-of-fit claim): a stricter alpha (often 1% to 5%) is usually more appropriate.

The one-sample K-S test also requires extra care: if distribution parameters are estimated from the same data, the usual K-S p-value is conservative (see the Lilliefors test section below). In practice, report the p-value together with the K-S statistic \(D\) and the role of the test. For the handbook-wide framework, see Chapter 112.

125.2 One-Sample Test

The one-sample K-S test compares the empirical distribution of a sample to a specified theoretical distribution.

125.2.1 Test Statistic

The test statistic is the maximum absolute difference between the empirical cumulative distribution function \(F_n(x)\) and the theoretical cumulative distribution function \(F(x)\):

\[ D_n = \sup_x |F_n(x) - F(x)| \]

where \(F_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}_{X_i \leq x}\) is the empirical CDF.

125.2.2 Hypotheses

\[ H_0: \text{The data follow the specified distribution } F(x) \]

\[ H_A: \text{The data do not follow the specified distribution } F(x) \]

125.2.3 Critical Values

For large samples, the critical value at significance level \(\alpha\) is approximately:

\[ D_{n,\alpha} \approx \sqrt{\frac{-\ln(\alpha/2)}{2n}} \]

Common critical values for the one-sample K-S test:

Table 125.1: Critical values for one-sample K-S test
\(\alpha\) Critical Value
0.10 \(1.22 / \sqrt{n}\)
0.05 \(1.36 / \sqrt{n}\)
0.01 \(1.63 / \sqrt{n}\)

125.3 Two-Sample Test

The two-sample K-S test compares the distributions of two independent samples to determine if they come from the same population.

125.3.1 Test Statistic

The test statistic is the maximum absolute difference between the two empirical CDFs:

\[ D_{n,m} = \sup_x |F_n(x) - G_m(x)| \]

where \(F_n(x)\) and \(G_m(x)\) are the empirical CDFs of samples of size \(n\) and \(m\) respectively.

125.3.2 Hypotheses

\[ H_0: \text{Both samples come from the same distribution} \]

\[ H_A: \text{The samples come from different distributions} \]

125.3.3 Critical Values

For large samples, the null hypothesis is rejected at level \(\alpha\) if:

\[ D_{n,m} > c(\alpha) \sqrt{\frac{n + m}{nm}} \]

where \(c(0.10) = 1.22\), \(c(0.05) = 1.36\), and \(c(0.01) = 1.63\).

125.4 Comparison with Chi-Squared Goodness-of-Fit Test

Table 125.2: Comparison of K-S and Chi-squared goodness-of-fit tests
Aspect Kolmogorov-Smirnov Chi-Squared
Data type Continuous Categorical or binned continuous
Binning required No Yes
Information loss None Some (due to binning)
Sensitivity Uniform across distribution Depends on bin placement
Sample size Works with small samples Requires larger samples
Distribution-free Yes (for two-sample) No
Parameters Must be specified, not estimated Can be estimated from data

For the one-sample K-S test, the theoretical distribution parameters must be specified in advance. If parameters are estimated from the data, the test becomes conservative (p-values are too large). The Lilliefors test addresses this issue for testing normality with estimated mean and variance.

125.5 R Module

125.5.1 Public website

The Kolmogorov-Smirnov Test is available on the public website:

  • https://compute.wessa.net/rwasp_kstest.wasp

125.5.2 RFC

The Kolmogorov-Smirnov Test module is available in RFC under the menu “Hypotheses / Goodness-of-Fit Tests”.

125.5.3 R Code

The following code demonstrates the one-sample K-S test for normality:

# Generate sample data
set.seed(42)
x <- rnorm(100, mean = 50, sd = 10)

# One-sample K-S test against normal distribution
# Parameters must be specified (not estimated from data)
ks.test(x, "pnorm", mean = 50, sd = 10)

    Asymptotic one-sample Kolmogorov-Smirnov test

data:  x
D = 0.071766, p-value = 0.6817
alternative hypothesis: two-sided

Testing against a uniform distribution:

# Generate uniform data
set.seed(123)
y <- runif(80, min = 0, max = 1)

# Test against standard uniform
ks.test(y, "punif", min = 0, max = 1)

    Exact one-sample Kolmogorov-Smirnov test

data:  y
D = 0.047204, p-value = 0.9905
alternative hypothesis: two-sided

The two-sample K-S test compares two samples:

# Two samples from different distributions
set.seed(456)
sample1 <- rnorm(50, mean = 0, sd = 1)
sample2 <- rnorm(60, mean = 0.5, sd = 1.2)

# Two-sample K-S test
ks.test(sample1, sample2)

    Exact two-sample Kolmogorov-Smirnov test

data:  sample1 and sample2
D = 0.21333, p-value = 0.1441
alternative hypothesis: two-sided

125.6 Visualizing the Test

The K-S test statistic can be visualized as the maximum vertical distance between the empirical and theoretical CDFs:

Code
set.seed(42)
x <- rnorm(30, mean = 0, sd = 1)

# Compute ECDF
ecdf_x <- ecdf(x)
x_sorted <- sort(x)
ecdf_values <- ecdf_x(x_sorted)
theoretical_values <- pnorm(x_sorted, mean = 0, sd = 1)

# Find maximum difference
differences <- abs(ecdf_values - theoretical_values)
max_idx <- which.max(differences)
D_stat <- max(differences)

# Plot
plot(x_sorted, ecdf_values, type = "s", lwd = 2, col = "blue",
     xlab = "x", ylab = "Cumulative Probability",
     main = paste("K-S Test Visualization (D =", round(D_stat, 3), ")"),
     ylim = c(0, 1))
curve(pnorm(x, mean = 0, sd = 1), add = TRUE, col = "red", lwd = 2)

# Draw maximum difference
segments(x_sorted[max_idx], ecdf_values[max_idx],
         x_sorted[max_idx], theoretical_values[max_idx],
         col = "darkgreen", lwd = 3)

legend("bottomright",
       legend = c("Empirical CDF", "Theoretical CDF (Normal)", "Max difference (D)"),
       col = c("blue", "red", "darkgreen"), lwd = 2)
Figure 125.1: Visualization of K-S test statistic as maximum distance between ECDFs

125.7 Example: Testing for Normality

A quality control engineer measures the diameter of 50 manufactured parts and wants to test if the measurements follow a normal distribution with mean 10mm and standard deviation 0.5mm:

# Simulated measurement data (slightly non-normal)
set.seed(789)
measurements <- c(rnorm(40, mean = 10, sd = 0.5),
                  rnorm(10, mean = 10.3, sd = 0.3))

# Test against specified normal distribution
result <- ks.test(measurements, "pnorm", mean = 10, sd = 0.5)
print(result)

# Summary statistics
cat("\nSample mean:", round(mean(measurements), 3), "\n")
cat("Sample SD:", round(sd(measurements), 3), "\n")

    Exact one-sample Kolmogorov-Smirnov test

data:  measurements
D = 0.12623, p-value = 0.3718
alternative hypothesis: two-sided


Sample mean: 9.968 
Sample SD: 0.435 
Code
# Visualization
plot(ecdf(measurements), main = "Goodness-of-Fit: Measurements vs Normal",
     xlab = "Diameter (mm)", ylab = "Cumulative Probability",
     col = "blue", lwd = 2)
curve(pnorm(x, mean = 10, sd = 0.5), add = TRUE, col = "red", lwd = 2, lty = 2)
legend("bottomright",
       legend = c("Empirical CDF", "Theoretical Normal(10, 0.5)"),
       col = c("blue", "red"), lwd = 2, lty = c(1, 2))
Figure 125.2: ECDF of measurements compared to theoretical normal distribution

125.8 Example: Comparing Two Samples

A researcher wants to test if exam scores from two different teaching methods come from the same distribution:

# Exam scores from two teaching methods
set.seed(101)
method_A <- c(72, 78, 81, 85, 88, 90, 92, 95, 97, 99,
              75, 80, 83, 86, 89, 91, 94, 96, 98, 100)
method_B <- c(65, 68, 70, 73, 76, 79, 82, 84, 87, 89,
              67, 71, 74, 77, 80, 83, 85, 88, 90, 93)

# Two-sample K-S test
ks.test(method_A, method_B)

    Exact two-sample Kolmogorov-Smirnov test

data:  method_A and method_B
D = 0.4, p-value = 0.07454
alternative hypothesis: two-sided
Code
# Plot both ECDFs
plot(ecdf(method_A), col = "blue", lwd = 2,
     main = "Comparison of Exam Score Distributions",
     xlab = "Score", ylab = "Cumulative Probability")
plot(ecdf(method_B), add = TRUE, col = "red", lwd = 2)
legend("bottomright",
       legend = c("Method A", "Method B"),
       col = c("blue", "red"), lwd = 2)
Figure 125.3: Comparison of ECDFs for two teaching methods

The small p-value indicates that the score distributions differ significantly between the two teaching methods.

125.9 Lilliefors Test

When testing for normality with parameters estimated from the data (rather than specified in advance), the standard K-S test is too conservative. The Lilliefors test (Lilliefors 1967) corrects for this:

# Install nortest package if needed
install.packages("nortest")
# Test normality with estimated parameters
set.seed(42)
data <- rnorm(100, mean = 50, sd = 10)

# Lilliefors test (uses estimated mean and SD)
if (requireNamespace("nortest", quietly = TRUE)) {
  nortest::lillie.test(data)
} else {
  cat("Install the 'nortest' package to run the Lilliefors test\n")
}

# Compare with standard K-S using estimated parameters (conservative)
ks.test(data, "pnorm", mean = mean(data), sd = sd(data))

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  data
D = 0.055683, p-value = 0.6253


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  data
D = 0.055683, p-value = 0.9158
alternative hypothesis: two-sided

Note that the Lilliefors test gives a smaller p-value because it correctly accounts for parameter estimation.

125.10 Shapiro-Wilk Test

The Shapiro-Wilk test (Shapiro and Wilk 1965) is widely regarded as the most powerful test for normality, especially for small to moderate sample sizes (\(n < 50\)). It is specifically designed for testing whether data come from a normal distribution, unlike the K-S test which is a general-purpose goodness-of-fit test.

125.10.1 How It Works

The test statistic \(W\) is based on the correlation between the data and the expected normal order statistics:

\[ W = \frac{\left(\sum_{i=1}^{n} a_i x_{(i)}\right)^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]

where \(x_{(i)}\) are the ordered sample values, \(\bar{x}\) is the sample mean, and \(a_i\) are constants derived from the means, variances, and covariances of the order statistics of a sample of size \(n\) from a standard normal distribution.

Intuitively, \(W\) measures how well the ordered data values match what we would expect from a normal distribution. Values of \(W\) close to 1 indicate normality; values significantly less than 1 indicate departure from normality.

125.10.2 Hypotheses

\[ H_0: \text{The data are normally distributed} \]

\[ H_A: \text{The data are not normally distributed} \]

125.10.3 R Code

The Shapiro-Wilk test is available in base R without any additional packages:

# Example 1: Normal data
set.seed(42)
normal_data <- rnorm(50, mean = 100, sd = 15)
cat("Test on normally distributed data:\n")
shapiro.test(normal_data)
Test on normally distributed data:

    Shapiro-Wilk normality test

data:  normal_data
W = 0.98021, p-value = 0.5611
# Example 2: Non-normal data (exponential)
set.seed(42)
skewed_data <- rexp(50, rate = 0.1)
cat("Test on exponentially distributed data:\n")
shapiro.test(skewed_data)
Test on exponentially distributed data:

    Shapiro-Wilk normality test

data:  skewed_data
W = 0.68815, p-value = 5.102e-09

The first test (normal data) yields a large p-value, so we do not reject normality. The second test (exponential data) yields a very small p-value, providing strong evidence against normality.

125.10.4 Worked Example

A manufacturer measures the tensile strength (in MPa) of 20 steel specimens and wants to verify that the measurements are normally distributed before applying a t-test:

# Tensile strength measurements
set.seed(123)
strength <- c(rnorm(18, mean = 450, sd = 20), 510, 520)

cat("Summary statistics:\n")
cat("Mean:", round(mean(strength), 1), "MPa\n")
cat("SD:", round(sd(strength), 1), "MPa\n")
cat("n:", length(strength), "\n\n")

# Shapiro-Wilk test
shapiro.test(strength)
Summary statistics:
Mean: 459.1 MPa
SD: 27.1 MPa
n: 20 


    Shapiro-Wilk normality test

data:  strength
W = 0.95537, p-value = 0.456
Code
qqnorm(strength, main = "Normal QQ Plot - Tensile Strength")
qqline(strength, col = "red", lwd = 2)
Figure 125.4: QQ plot of tensile strength data for visual normality assessment

125.10.5 Limitations

  • The sample size \(n\) must be between 3 and 5000
  • The test is sensitive to ties (repeated values); with many ties, the p-value may be unreliable
  • Like all hypothesis tests, the Shapiro-Wilk test becomes very sensitive with large samples – even trivial departures from normality may produce significant results when \(n\) is large
  • It tests only for normality, not for other distributions (unlike the K-S test)

125.10.6 Comparison: When to Use Which Normality Test

Table 125.3: Comparison of normality tests
Test Best For Limitations R Function
Shapiro-Wilk Small to moderate samples (\(n < 50\)); highest power Limited to \(n \leq 5000\); normality only shapiro.test()
Lilliefors (Section 125.9) Moderate to large samples; parameters estimated from data Less powerful than Shapiro-Wilk for small \(n\) nortest::lillie.test()
K-S test (Section 125.2) Any distribution (not just normal); parameters known in advance Conservative when parameters are estimated; less powerful for normality ks.test()
QQ plot (Chapter 76) Visual assessment at any sample size Subjective; no formal p-value qqnorm()

In practice, it is recommended to use the Shapiro-Wilk test as the primary formal test for normality, supplemented by a QQ plot for visual assessment.

125.11 Pros & Cons

125.11.1 Pros

The Kolmogorov-Smirnov test has the following advantages:

  • Does not require binning of continuous data, preserving all information.
  • Distribution-free for the two-sample case.
  • Works well with small sample sizes.
  • Sensitive to differences in location, scale, and shape of distributions.
  • Easy to visualize and interpret.

125.11.2 Cons

The Kolmogorov-Smirnov test has the following disadvantages:

  • For one-sample tests, parameters must be specified in advance (not estimated from data).
  • Less powerful than some alternatives (e.g., Anderson-Darling, Shapiro-Wilk for normality).
  • Most sensitive to differences near the center of the distribution, less sensitive in the tails.
  • Applies only to continuous distributions.
  • Ties in the data require special handling.

125.12 Related Tests

  • Anderson-Darling test: Similar to K-S but gives more weight to the tails of the distribution. Often more powerful for detecting departures from normality.
  • Shapiro-Wilk test: Specifically designed for testing normality; generally more powerful than K-S for this purpose.
  • Cramér-von Mises test: Uses the integral of the squared difference between CDFs rather than the maximum difference.
  • Chi-squared goodness-of-fit test (Section 124.1): Requires binning but can handle discrete distributions and estimated parameters.

125.13 Task

  1. Generate 100 random numbers from an exponential distribution with rate \(\lambda = 0.5\). Use the K-S test to verify that the data follow this distribution.

  2. Generate two samples: one from a standard normal distribution and one from a t-distribution with 5 degrees of freedom. Use the two-sample K-S test to determine if they differ significantly.

  3. The following data represent waiting times (in minutes) at a service counter: 2.1, 0.5, 3.2, 1.8, 4.5, 0.9, 2.7, 1.2, 5.1, 0.3. Test whether these data follow an exponential distribution with mean 2 minutes.

  4. Compare the K-S test and Chi-squared goodness-of-fit test on the same dataset. Discuss the advantages and disadvantages of each approach.

Kolmogorov, Andrey N. 1933. “Sulla Determinazione Empirica Di Una Legge Di Distribuzione.” Giornale Dell’Istituto Italiano Degli Attuari 4: 83–91.
Lilliefors, Hubert W. 1967. “On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown.” Journal of the American Statistical Association 62 (318): 399–402. https://doi.org/10.1080/01621459.1967.10482916.
Shapiro, S. S., and M. B. Wilk. 1965. “An Analysis of Variance Test for Normality (Complete Samples).” Biometrika 52 (3/4): 591–611. https://doi.org/10.1093/biomet/52.3-4.591.
Smirnov, Nickolay. 1948. “Table for Estimating the Goodness of Fit of Empirical Distributions.” The Annals of Mathematical Statistics 19 (2): 279–81. https://doi.org/10.1214/aoms/1177730256.
124  Chi-Squared Tests for Count Data
126  One Way Analysis of Variance (1-way ANOVA)

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences