Kruskal–Wallis one-way analysis of variance

The Kruskal–Wallis test by ranks, Kruskal–Wallis H test^[1] (named after William Kruskal and W. Allen Wallis), or One-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution.^[2]^[3]^[4] It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test when there are more than two groups. The parametric equivalent of the Kruskal-Wallis test is the one-way analysis of variance (ANOVA). A significant Kruskal-Wallis test indicates that at least one sample stochastically dominates one other sample. The test does not identify where this stochastic dominance occurs or for how many pairs of groups stochastic dominance obtains. Dunn's test,^[5] or the more powerful but less well known Conover-Iman test^[6] would help analyze the specific sample pairs for stochastic dominance in post hoc tests.

Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution of the residuals, unlike the analogous one-way analysis of variance. If the researcher can make the less stringent assumptions of an identically shaped and scaled distribution for all groups, except for any difference in medians, then the null hypothesis is that the medians of all groups are equal, and the alternative hypothesis is that at least one population median of one group is different from the population median of at least one other group.

Method

Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied.
The test statistic is given by:
$H = (N-1)\frac{\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i}(r_{ij} - \bar{r})^2},$ where:
- $n_{i}$ is the number of observations in group $i$
- $r_{ij}$ is the rank (among all observations) of observation $j$ from group $i$
- $N$ is the total number of observations across all groups
- $\bar{r}_{i\cdot} = \frac{\sum_{j=1}^{n_i}{r_{ij}}}{n_i}$ is the average rank of all observations in group $i$
- $\bar{r} =\tfrac 12 (N+1)$ is the average of all the $r_{ij}$ .
If the data contain no ties the denominator of the expression for $H$ is exactly $(N-1)N(N+1)/12$ and $\bar{r}=\tfrac{N+1}{2}$ . Thus
$\begin{align} H & = \frac{12}{N(N+1)}\sum_{i=1}^g n_i \left(\bar{r}_{i\cdot} - \frac{N+1}{2}\right)^2 \\ & = \frac{12}{N(N+1)}\sum_{i=1}^g n_i \bar{r}_{i\cdot }^2 -\ 3(N+1). \end{align}$
The last formula only contains the squares of the average ranks.
A correction for ties if using the short-cut formula described in the previous point can be made by dividing $H$ by $1 - \frac{\sum_{i=1}^G (t_i^3 - t_i)}{N^3-N}$ , where G is the number of groupings of different tied ranks, and t_i is the number of tied values within group i that are tied at a particular value. This correction usually makes little difference in the value of H unless there are a large number of ties.
Finally, the p-value is approximated by $\Pr(\chi^2_{g-1} \ge H)$ . If some $n_{i}$ values are small (i.e., less than 5) the probability distribution of H can be quite different from this chi-squared distribution. If a table of the chi-squared probability distribution is available, the critical value of chi-squared, $\chi^2_{\alpha: g-1}$ , can be found by entering the table at g − 1 degrees of freedom and looking under the desired significance or alpha level.
If the statistic is not significant, then there is no evidence of stochastic dominance between the samples. However, if the test is significant then at least one sample stochastically dominates another sample. Therefore, a researcher might use sample contrasts between individual sample pairs, or post hoc tests using Dunn's test, which (1) properly employs the same rankings as the Kruskal-Wallis test, and (2) properly employs the pooled variance implied by the null hypothesis of the Kruskal-Wallis test in order to determine which of the sample pairs are significantly different.^[5] When performing multiple sample contrasts or tests, the Type I error rate tends to become inflated, raising concerns about multiple comparisons.

Exact probability tables

A large amount of computing resources is required to compute exact probabilities for the Kruskal-Wallis test. Existing software only provides exact probabilities for sample sizes less than about 30 participants. These software programs rely on asymptotic approximation for larger sample sizes. Exact probability values for larger sample sizes are available. Spurrier (2003) published exact probability tables for samples as large as 45 participants.^[7] Meyer and Seaman (2006) produced exact probability distributions for samples as large as 105 participants.^[8]

References

↑ Kruskal-Wallis H Test using SPSS Statistics, Laerd Statistics
↑ Kruskal; Wallis (1952). "Use of ranks in one-criterion variance analysis". Journal of the American Statistical Association. 47 (260): 583–621. doi:10.1080/01621459.1952.10483441.
↑ Corder, Gregory W.; Foreman, Dale I. (2009). Nonparametric Statistics for Non-Statisticians. Hoboken: John Wiley & Sons. pp. 99–105. ISBN 9780470454619.
↑ Siegel; Castellan (1988). Nonparametric Statistics for the Behavioral Sciences (Second ed.). New York: McGraw–Hill. ISBN 0070573573.
1 2 Dunn, Olive Jean (1964). "Multiple comparisons using rank sums". Technometrics. 6 (3): 241–252. doi:10.2307/1266041.
↑ Conover, W. Jay; Iman, Ronald L. (1979). "On multiple-comparisons procedures" (PDF) (Report). Los Alamos Scientific Laboratory. Retrieved 2016-10-28.
↑ Spurrier, J. D. (2003). "On the null distribution of the Kruskal–Wallis statistic". Journal of Nonparametric Statistics. 15 (6): 685–691. doi:10.1080/10485250310001634719.
↑ Meyer; Seaman (April 2006). "Expanded tables of critical values for the Kruskal-Wallis H statistic". Paper presented at the annual meeting of the American Educational Research Association, San Francisco. Critical value tables and exact probabilities from Meyer and Seaman are available for download at http://faculty.virginia.edu/kruskal-wallis/. A paper describing their work may also be found there.

External links

An online version of the test

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 10/29/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Kruskal–Wallis one-way analysis of variance

Method

Exact probability tables

See also

References

Further reading

External links