• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Hypothesis Testing
  2. 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 121.1 Hypotheses
    • 121.1.1 Classical model
    • 121.1.2 Randomization model
  • 121.2 Analysis based on p-values
    • 121.2.1 KS Test for distributions
    • 121.2.2 KS Test for distributional shapes
  • 121.3 Assumptions
  • 121.4 Alternatives
  1. Hypothesis Testing
  2. 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)

121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)

The Wilcoxon Rank-Sum Test is not the same as the Wilcoxon Signed-Rank Test as described in Chapter 117. The latter is exclusively used for paired/dependent samples as an alternative for the Paired Two Sample t-Test.

The Wilcoxon Rank-Sum Test (Wilcoxon 1945) (also commonly referred to as the Mann-Whitney or Wilcoxon-Mann-Whitney U Test (Mann and Whitney 1947)) is used as an alternative for the Unpaired Two Sample t-Test and Welch Test. The main advantage of both types of Wilcoxon tests is that they are non-parametric which implies that there is no need to make the usual assumptions that are associated to the Central Limit Theorem.

121.1 Hypotheses

The Hypotheses which are tested by the Wilcoxon Rank-Sum Test depend on the type of inference that is performed, i.e. the classical population model, or the randomization model which is mostly used in medical research.

121.1.1 Classical model

In this setting we test whether the two populations have the same distribution (Null Hypothesis) or not (Alternative Hypothesis). The Wilcoxon Rank-Sum Test is used to test whether one distribution is “shifted” by a constant amount when compared to the other. When the Null Hypothesis is not rejected then there is no evidence of a location shift between the distributions. When the Null Hypothesis is rejected, one distribution may be shifted to the right or left in comparison to the other population.

Important: This location-shift interpretation is only valid when both populations have the same distributional shape (i.e., identical variance, skewness, and kurtosis). Without this assumption, a significant result could reflect differences in shape or spread rather than location. The equal-shape assumption can be assessed using diagnostics such as the Kolmogorov-Smirnov Test on centered samples (see below), but a non-significant result does not prove equal shape.

121.1.2 Randomization model

In this setting the Wilcoxon Rank-Sum Test is used to test differences between randomized groups in terms of their mean ranks. Unlike the classical model, the mean-rank interpretation does not require the equal-shape assumption—the test validly compares mean ranks regardless of whether the distributions have the same shape.

However, when shapes differ, a significant result only tells us that mean ranks differ; it does not indicate why they differ (location, spread, or shape). For this reason, the equal-shape assumption remains useful for substantive interpretation, even though it is not required for the test’s validity.

In many textbooks and software manuals, the Wilcoxon Rank-Sum Test is said to test for equality of group medians, which is actually wrong.

Mean ranks are not the same as medians, which can be easily illustrated with a simple example. Suppose we have two experimental groups (a control and a treatment group). Also assume that the measurements \(x_{1i}\) of the control group are all the same (e.g. \(x_{1i} = 15\)), implying that the control observations receive the same tied mid-rank in the combined ranking. Hence, the mean rank of the control group equals that shared mid-rank (not simply the number of data points \(n\)).

Now assume that the treatment group has a non-zero variance with measurements varying around 15. This implies that not all measurements are equal and that the ranks will not all be equal. As a consequence the mean rank can differ from the control group’s mean rank while it is still possible that the median is 15.

The example illustrates that it is possible that the medians in both groups are equal, while the mean ranks are not. The treatment of ties plays a very important role in this respect.

121.2 Analysis based on p-values

Again, consider the analysis that was presented for the ordinary Unpaired Two Sample t-Test. We only need to consider the case of the two-sided Hypothesis Test to illustrate the Wilcoxon Rank-Sum Test (the one-sided tests can be interpreted in similar ways).

The analysis shown below is a copy of the example shown in the previous section. The output shows the results of the Wilcoxon Rank-Sum Test which is produced in the same computation as the ordinary Unpaired Two Sample t-Test.

Interactive Shiny app (click to load).
Open in new tab

The p-value of the Wilcoxon Rank-Sum Test is (approximately) 2.24e-11 which is smaller than the chosen type I error level of 5%. We conclude that the Null Hypothesis should be rejected, implying that (depending on whether the classical or randomization model is appropriate) either one of the following statements can be made:

  • both populations have similar distributions but are shifted along the x-axis by a constant amount
  • the mean ranks of both populations are (significantly) different from each other

As stated before, the Wilcoxon Rank-Sum Test does not make distributional assumptions (because it is a non-parametric procedure). What this means is that we do not have to assume any specific continuous distribution (such as the Normal Distribution). This, however, does not imply that there are no assumptions that should be satisfied when using this test (as explained in the classical model).

The main assumption made by the Wilcoxon Rank-Sum Test is that both populations have “similar” distributions in the sense that they have the same shape. The so-called Kolmogorov Smirnov Test (KS Test) can be used as a diagnostic to test whether two samples are drawn from the same continuous distribution (i.e. the Null Hypothesis) or not. When the p-value of the KS Test is small, we reject the Null Hypothesis and conclude that the distributions are not equal.

In the output there are two KS Tests: one to compare the distributions and one which is related to the shapes of both distributions.

121.2.1 KS Test for distributions

First it is tested whether the distributions are equal (H\(_0\)). The p-value is 7.634e-10 which leads us to conclude that both distributions are different (H\(_0\) is rejected). This computation, however, is not really appropriate to test the assumption that underlies the Wilcoxon Rank-Sum Test because two distributions can be different due to inequality of the location or inequality of the shape (or both). Since the Wilcoxon Rank-Sum Test revealed that the location of both distributions is different, the KS Test should not simply compare the distributions as a whole but only take into account the shape.

121.2.2 KS Test for distributional shapes

We can explore whether the shape of the distributions are equal by applying the KS Test on the centered samples (i.e. by subtracting the sample means from the observations). The second KS Test is useful as a shape-focused diagnostic because centering removes the location difference from the comparison.

The results clearly show that H\(_0\) should not be rejected: the p-value is (approximately) 0.7672, so we fail to reject the Null Hypothesis. Hence, we do not detect a shape difference in this sample; this is consistent with (but does not prove) the equal-shape assumption of the Wilcoxon Rank-Sum Test.

For two groups of sizes \(n_1\) and \(n_2\), let \(R_1\) be the sum of ranks for group 1. The Mann-Whitney statistic is

\[ U_1 = R_1 - \frac{n_1(n_1+1)}{2}, \quad U_2 = n_1 n_2 - U_1, \quad U = \min(U_1,U_2). \]

With ties, the p-value computation uses a tie correction (or exact methods when available for small samples without ties). In R, wilcox.test handles this automatically.

For reporting, add an effect size such as rank-biserial correlation:

\[ r_{rb} = 1 - \frac{2U}{n_1 n_2} \]

To compute the Mann-Whitney U test (Wilcoxon Rank-Sum Test) on your local machine, the following script can be used in the R console:

set.seed(123)
A <- runif(15, 1, 7) + 2
B <- runif(15, 1, 7)
x <- cbind(A, B)
par1 = 1 #column number of first sample
par2 = 2 #column number of second sample
par3 = 0.95 #confidence (= 1 - alpha)
par4 = 'two.sided'
par5 = 'unpaired'
par6 = 0.0 #Null Hypothesis
main = 'Two Samples'
if (par5 == 'unpaired') paired <- FALSE else paired <- TRUE
(wilcox.test(x[,par1], x[,par2], alternative=par4, paired=paired, mu=par6, conf.level=par3))
(ks.test(x[,par1], x[,par2], alternative=par4))
m1 <- mean(x[,par1],na.rm=T)
m2 <- mean(x[,par2],na.rm=T)
mdiff <- m1 - m2
newsam1 <- x[!is.na(x[,par1]),par1]
newsam2 <- x[,par2]+mdiff
newsam2 <- newsam2[!is.na(newsam2)]
(ks.test(newsam1, newsam2, alternative=par4))

    Wilcoxon rank sum exact test

data:  x[, par1] and x[, par2]
W = 175, p-value = 0.008642
alternative hypothesis: true location shift is not equal to 0


    Exact two-sample Kolmogorov-Smirnov test

data:  x[, par1] and x[, par2]
D = 0.53333, p-value = 0.02625
alternative hypothesis: two-sided


    Exact two-sample Kolmogorov-Smirnov test

data:  newsam1 and newsam2
D = 0.2, p-value = 0.9383
alternative hypothesis: two-sided

Note that the above script assume that the data has a wide format. The wilcox.test can, of course, also be used with the formula syntax when the dataset has a long format.

121.3 Assumptions

As mentioned in the previous section, we assume that the distributions of both populations are similar in terms of shape.

121.4 Alternatives

The alternative of this test are explained in Section 118.4.

Mann, Henry B., and Donald R. Whitney. 1947. “On a Test of Whether One of Two Random Variables Is Stochastically Larger Than the Other.” The Annals of Mathematical Statistics 18 (1): 50–60. https://doi.org/10.1214/aoms/1177730491.
Wilcoxon, Frank. 1945. “Individual Comparisons by Ranking Methods.” Biometrics Bulletin 1 (6): 80–83. https://doi.org/10.2307/3001968.
120  Two One-Sided Tests (TOST) for Equivalence
122  Bayesian Two Sample Test

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences