125 Kolmogorov-Smirnov Test

125.1 Definition

The Kolmogorov-Smirnov (K-S) test (Kolmogorov 1933; Smirnov 1948) is a nonparametric goodness-of-fit test that compares cumulative distribution functions. It can be used as a one-sample test (comparing data to a theoretical distribution) or a two-sample test (comparing two datasets).

Unlike the Chi-squared goodness-of-fit test (Section 124.1) which requires binning continuous data into categories, the K-S test works directly with the empirical cumulative distribution function (ECDF) and preserves all information in the data.

Decision Threshold Choice

The K-S test can play different roles, and the threshold should follow the role:

Diagnostic role (e.g. shape/assumption screening): a higher alpha (often 10% to 20%) may be reasonable to reduce false reassurance.
Confirmatory role (formal goodness-of-fit claim): a stricter alpha (often 1% to 5%) is usually more appropriate.

The one-sample K-S test also requires extra care: if distribution parameters are estimated from the same data, the usual K-S p-value is conservative (see the Lilliefors test section below). In practice, report the p-value together with the K-S statistic \(D\) and the role of the test. For the handbook-wide framework, see Chapter 112.

125.2 One-Sample Test

The one-sample K-S test compares the empirical distribution of a sample to a specified theoretical distribution.

125.2.1 Test Statistic

The test statistic is the maximum absolute difference between the empirical cumulative distribution function \(F_n(x)\) and the theoretical cumulative distribution function \(F(x)\):

\[ D_n = \sup_x |F_n(x) - F(x)| \]

where \(F_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}_{X_i \leq x}\) is the empirical CDF.

125.2.2 Hypotheses

\[ H_0: \text{The data follow the specified distribution } F(x) \]

\[ H_A: \text{The data do not follow the specified distribution } F(x) \]

125.2.3 Critical Values

For large samples, the critical value at significance level \(\alpha\) is approximately:

\[ D_{n,\alpha} \approx \sqrt{\frac{-\ln(\alpha/2)}{2n}} \]

Common critical values for the one-sample K-S test:

Table 125.1: Critical values for one-sample K-S test

\(\alpha\)	Critical Value
0.10	\(1.22 / \sqrt{n}\)
0.05	\(1.36 / \sqrt{n}\)
0.01	\(1.63 / \sqrt{n}\)

125.3 Two-Sample Test

The two-sample K-S test compares the distributions of two independent samples to determine if they come from the same population.

125.3.1 Test Statistic

The test statistic is the maximum absolute difference between the two empirical CDFs:

\[ D_{n,m} = \sup_x |F_n(x) - G_m(x)| \]

where \(F_n(x)\) and \(G_m(x)\) are the empirical CDFs of samples of size \(n\) and \(m\) respectively.

125.3.2 Hypotheses

\[ H_0: \text{Both samples come from the same distribution} \]

\[ H_A: \text{The samples come from different distributions} \]

125.3.3 Critical Values

For large samples, the null hypothesis is rejected at level \(\alpha\) if:

\[ D_{n,m} > c(\alpha) \sqrt{\frac{n + m}{nm}} \]

where \(c(0.10) = 1.22\), \(c(0.05) = 1.36\), and \(c(0.01) = 1.63\).

125.4 Comparison with Chi-Squared Goodness-of-Fit Test

Table 125.2: Comparison of K-S and Chi-squared goodness-of-fit tests

Aspect	Kolmogorov-Smirnov	Chi-Squared
Data type	Continuous	Categorical or binned continuous
Binning required	No	Yes
Information loss	None	Some (due to binning)
Sensitivity	Uniform across distribution	Depends on bin placement
Sample size	Works with small samples	Requires larger samples
Distribution-free	Yes (for two-sample)	No
Parameters	Must be specified, not estimated	Can be estimated from data

For the one-sample K-S test, the theoretical distribution parameters must be specified in advance. If parameters are estimated from the data, the test becomes conservative (p-values are too large). The Lilliefors test addresses this issue for testing normality with estimated mean and variance.

125.5 R Module

125.5.1 Public website

The Kolmogorov-Smirnov Test is available on the public website:

https://compute.wessa.net/rwasp_kstest.wasp

125.5.2 RFC

The Kolmogorov-Smirnov Test module is available in RFC under the menu “Hypotheses / Goodness-of-Fit Tests”.

125.5.3 R Code

The following code demonstrates the one-sample K-S test for normality:

# Generate sample data
set.seed(42)
x <- rnorm(100, mean = 50, sd = 10)

# One-sample K-S test against normal distribution
# Parameters must be specified (not estimated from data)
ks.test(x, "pnorm", mean = 50, sd = 10)


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  x
D = 0.071766, p-value = 0.6817
alternative hypothesis: two-sided

Testing against a uniform distribution:

# Generate uniform data
set.seed(123)
y <- runif(80, min = 0, max = 1)

# Test against standard uniform
ks.test(y, "punif", min = 0, max = 1)


    Exact one-sample Kolmogorov-Smirnov test

data:  y
D = 0.047204, p-value = 0.9905
alternative hypothesis: two-sided

The two-sample K-S test compares two samples:

# Two samples from different distributions
set.seed(456)
sample1 <- rnorm(50, mean = 0, sd = 1)
sample2 <- rnorm(60, mean = 0.5, sd = 1.2)

# Two-sample K-S test
ks.test(sample1, sample2)


    Exact two-sample Kolmogorov-Smirnov test

data:  sample1 and sample2
D = 0.21333, p-value = 0.1441
alternative hypothesis: two-sided

125.6 Visualizing the Test

The K-S test statistic can be visualized as the maximum vertical distance between the empirical and theoretical CDFs:

Code

set.seed(42)
x <- rnorm(30, mean = 0, sd = 1)

# Compute ECDF
ecdf_x <- ecdf(x)
x_sorted <- sort(x)
ecdf_values <- ecdf_x(x_sorted)
theoretical_values <- pnorm(x_sorted, mean = 0, sd = 1)

# Find maximum difference
differences <- abs(ecdf_values - theoretical_values)
max_idx <- which.max(differences)
D_stat <- max(differences)

# Plot
plot(x_sorted, ecdf_values, type = "s", lwd = 2, col = "blue",
     xlab = "x", ylab = "Cumulative Probability",
     main = paste("K-S Test Visualization (D =", round(D_stat, 3), ")"),
     ylim = c(0, 1))
curve(pnorm(x, mean = 0, sd = 1), add = TRUE, col = "red", lwd = 2)

# Draw maximum difference
segments(x_sorted[max_idx], ecdf_values[max_idx],
         x_sorted[max_idx], theoretical_values[max_idx],
         col = "darkgreen", lwd = 3)

legend("bottomright",
       legend = c("Empirical CDF", "Theoretical CDF (Normal)", "Max difference (D)"),
       col = c("blue", "red", "darkgreen"), lwd = 2)

Figure 125.1: Visualization of K-S test statistic as maximum distance between ECDFs

125.7 Example: Testing for Normality

A quality control engineer measures the diameter of 50 manufactured parts and wants to test if the measurements follow a normal distribution with mean 10mm and standard deviation 0.5mm:

# Simulated measurement data (slightly non-normal)
set.seed(789)
measurements <- c(rnorm(40, mean = 10, sd = 0.5),
                  rnorm(10, mean = 10.3, sd = 0.3))

# Test against specified normal distribution
result <- ks.test(measurements, "pnorm", mean = 10, sd = 0.5)
print(result)

# Summary statistics
cat("\nSample mean:", round(mean(measurements), 3), "\n")
cat("Sample SD:", round(sd(measurements), 3), "\n")


    Exact one-sample Kolmogorov-Smirnov test

data:  measurements
D = 0.12623, p-value = 0.3718
alternative hypothesis: two-sided


Sample mean: 9.968 
Sample SD: 0.435

Code

# Visualization
plot(ecdf(measurements), main = "Goodness-of-Fit: Measurements vs Normal",
     xlab = "Diameter (mm)", ylab = "Cumulative Probability",
     col = "blue", lwd = 2)
curve(pnorm(x, mean = 10, sd = 0.5), add = TRUE, col = "red", lwd = 2, lty = 2)
legend("bottomright",
       legend = c("Empirical CDF", "Theoretical Normal(10, 0.5)"),
       col = c("blue", "red"), lwd = 2, lty = c(1, 2))

Figure 125.2: ECDF of measurements compared to theoretical normal distribution

125.8 Example: Comparing Two Samples

A researcher wants to test if exam scores from two different teaching methods come from the same distribution:

# Exam scores from two teaching methods
set.seed(101)
method_A <- c(72, 78, 81, 85, 88, 90, 92, 95, 97, 99,
              75, 80, 83, 86, 89, 91, 94, 96, 98, 100)
method_B <- c(65, 68, 70, 73, 76, 79, 82, 84, 87, 89,
              67, 71, 74, 77, 80, 83, 85, 88, 90, 93)

# Two-sample K-S test
ks.test(method_A, method_B)


    Exact two-sample Kolmogorov-Smirnov test

data:  method_A and method_B
D = 0.4, p-value = 0.07454
alternative hypothesis: two-sided

Code

# Plot both ECDFs
plot(ecdf(method_A), col = "blue", lwd = 2,
     main = "Comparison of Exam Score Distributions",
     xlab = "Score", ylab = "Cumulative Probability")
plot(ecdf(method_B), add = TRUE, col = "red", lwd = 2)
legend("bottomright",
       legend = c("Method A", "Method B"),
       col = c("blue", "red"), lwd = 2)

Figure 125.3: Comparison of ECDFs for two teaching methods

The small p-value indicates that the score distributions differ significantly between the two teaching methods.

125.9 Lilliefors Test

When testing for normality with parameters estimated from the data (rather than specified in advance), the standard K-S test is too conservative. The Lilliefors test (Lilliefors 1967) corrects for this:

# Install nortest package if needed
install.packages("nortest")

# Test normality with estimated parameters
set.seed(42)
data <- rnorm(100, mean = 50, sd = 10)

# Lilliefors test (uses estimated mean and SD)
if (requireNamespace("nortest", quietly = TRUE)) {
  nortest::lillie.test(data)
} else {
  cat("Install the 'nortest' package to run the Lilliefors test\n")
}

# Compare with standard K-S using estimated parameters (conservative)
ks.test(data, "pnorm", mean = mean(data), sd = sd(data))


    Lilliefors (Kolmogorov-Smirnov) normality test

data:  data
D = 0.055683, p-value = 0.6253


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  data
D = 0.055683, p-value = 0.9158
alternative hypothesis: two-sided

Note that the Lilliefors test gives a smaller p-value because it correctly accounts for parameter estimation.

125.10 Shapiro-Wilk Test

The Shapiro-Wilk test (Shapiro and Wilk 1965) is widely regarded as the most powerful test for normality, especially for small to moderate sample sizes (\(n < 50\)). It is specifically designed for testing whether data come from a normal distribution, unlike the K-S test which is a general-purpose goodness-of-fit test.

125.10.1 How It Works

The test statistic \(W\) is based on the correlation between the data and the expected normal order statistics:

\[ W = \frac{\left(\sum_{i=1}^{n} a_i x_{(i)}\right)^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]

where \(x_{(i)}\) are the ordered sample values, \(\bar{x}\) is the sample mean, and \(a_i\) are constants derived from the means, variances, and covariances of the order statistics of a sample of size \(n\) from a standard normal distribution.

Intuitively, \(W\) measures how well the ordered data values match what we would expect from a normal distribution. Values of \(W\) close to 1 indicate normality; values significantly less than 1 indicate departure from normality.

125.10.2 Hypotheses

\[ H_0: \text{The data are normally distributed} \]

\[ H_A: \text{The data are not normally distributed} \]

125.10.3 R Code

The Shapiro-Wilk test is available in base R without any additional packages:

# Example 1: Normal data
set.seed(42)
normal_data <- rnorm(50, mean = 100, sd = 15)
cat("Test on normally distributed data:\n")
shapiro.test(normal_data)

Test on normally distributed data:

    Shapiro-Wilk normality test

data:  normal_data
W = 0.98021, p-value = 0.5611

# Example 2: Non-normal data (exponential)
set.seed(42)
skewed_data <- rexp(50, rate = 0.1)
cat("Test on exponentially distributed data:\n")
shapiro.test(skewed_data)

Test on exponentially distributed data:

    Shapiro-Wilk normality test

data:  skewed_data
W = 0.68815, p-value = 5.102e-09

The first test (normal data) yields a large p-value, so we do not reject normality. The second test (exponential data) yields a very small p-value, providing strong evidence against normality.

125.10.4 Worked Example

A manufacturer measures the tensile strength (in MPa) of 20 steel specimens and wants to verify that the measurements are normally distributed before applying a t-test:

# Tensile strength measurements
set.seed(123)
strength <- c(rnorm(18, mean = 450, sd = 20), 510, 520)

cat("Summary statistics:\n")
cat("Mean:", round(mean(strength), 1), "MPa\n")
cat("SD:", round(sd(strength), 1), "MPa\n")
cat("n:", length(strength), "\n\n")

# Shapiro-Wilk test
shapiro.test(strength)

Summary statistics:
Mean: 459.1 MPa
SD: 27.1 MPa
n: 20 


    Shapiro-Wilk normality test

data:  strength
W = 0.95537, p-value = 0.456

Code

qqnorm(strength, main = "Normal QQ Plot - Tensile Strength")
qqline(strength, col = "red", lwd = 2)

Figure 125.4: QQ plot of tensile strength data for visual normality assessment

125.10.5 Limitations

The sample size \(n\) must be between 3 and 5000
The test is sensitive to ties (repeated values); with many ties, the p-value may be unreliable
Like all hypothesis tests, the Shapiro-Wilk test becomes very sensitive with large samples – even trivial departures from normality may produce significant results when \(n\) is large
It tests only for normality, not for other distributions (unlike the K-S test)

125.10.6 Comparison: When to Use Which Normality Test

Table 125.3: Comparison of normality tests

Test	Best For	Limitations	R Function
Shapiro-Wilk	Small to moderate samples (\(n < 50\)); highest power	Limited to \(n \leq 5000\); normality only	`shapiro.test()`
Lilliefors (Section 125.9)	Moderate to large samples; parameters estimated from data	Less powerful than Shapiro-Wilk for small \(n\)	`nortest::lillie.test()`
K-S test (Section 125.2)	Any distribution (not just normal); parameters known in advance	Conservative when parameters are estimated; less powerful for normality	`ks.test()`
QQ plot (Chapter 76)	Visual assessment at any sample size	Subjective; no formal p-value	`qqnorm()`

In practice, it is recommended to use the Shapiro-Wilk test as the primary formal test for normality, supplemented by a QQ plot for visual assessment.

125.11 Pros & Cons

125.11.1 Pros

The Kolmogorov-Smirnov test has the following advantages:

Does not require binning of continuous data, preserving all information.
Distribution-free for the two-sample case.
Works well with small sample sizes.
Sensitive to differences in location, scale, and shape of distributions.
Easy to visualize and interpret.

125.11.2 Cons

The Kolmogorov-Smirnov test has the following disadvantages:

For one-sample tests, parameters must be specified in advance (not estimated from data).
Less powerful than some alternatives (e.g., Anderson-Darling, Shapiro-Wilk for normality).
Most sensitive to differences near the center of the distribution, less sensitive in the tails.
Applies only to continuous distributions.
Ties in the data require special handling.

125.13 Task

Generate 100 random numbers from an exponential distribution with rate \(\lambda = 0.5\). Use the K-S test to verify that the data follow this distribution.
Generate two samples: one from a standard normal distribution and one from a t-distribution with 5 degrees of freedom. Use the two-sample K-S test to determine if they differ significantly.
The following data represent waiting times (in minutes) at a service counter: 2.1, 0.5, 3.2, 1.8, 4.5, 0.9, 2.7, 1.2, 5.1, 0.3. Test whether these data follow an exponential distribution with mean 2 minutes.
Compare the K-S test and Chi-squared goodness-of-fit test on the same dataset. Discuss the advantages and disadvantages of each approach.

125.1 Definition

125.2 One-Sample Test

125.2.1 Test Statistic

125.2.2 Hypotheses

125.2.3 Critical Values

125.3 Two-Sample Test

125.3.1 Test Statistic

125.3.2 Hypotheses

125.3.3 Critical Values

125.4 Comparison with Chi-Squared Goodness-of-Fit Test

125.5 R Module

125.5.1 Public website

125.5.2 RFC

125.5.3 R Code

125.6 Visualizing the Test

125.7 Example: Testing for Normality

125.8 Example: Comparing Two Samples

125.9 Lilliefors Test

125.10 Shapiro-Wilk Test

125.10.1 How It Works

125.10.2 Hypotheses

125.10.3 R Code

125.10.4 Worked Example

125.10.5 Limitations

125.10.6 Comparison: When to Use Which Normality Test

125.11 Pros & Cons

125.11.1 Pros

125.11.2 Cons

125.12 Related Tests

125.13 Task