Cochran's C test

In statistics, Cochran's C test,^[1] named after William G. Cochran, is a one-sided upper limit variance outlier test. The C test is used to decide if a single estimate of a variance (or a standard deviation) is significantly larger than a group of variances (or standard deviations) with which the single estimate is supposed to be comparable. The C test is discussed in many text books ^[2]^[3]^[4] and has been recommended by IUPAC ^[5] and ISO.^[6] Cochran's C test should not be confused with Cochran's Q test, which applies to the analysis of two-way randomized block designs.

The C test assumes a balanced design, i.e. the considered full data set should consist of individual data series that all have equal size. The C test further assumes that each individual data series is normally distributed. Although primarily an outlier test, the C test is also in use as a simple alternative for regular homoscedasticity tests such as Bartlett's test, Levene's test and the Brown–Forsythe test to check a statistical data set for homogeneity of variances. An even simpler way to check homoscedasticity is provided by Hartley's F_max test,^[3] but Hartley's F_max test has the disadvantage that it only accounts for the minimum and the maximum of the variance range, while the C test accounts for all variances within the range.

Description

The C test detects one exceptionally large variance value at a time. The corresponding data series is then omitted from the full data set. According to ISO standard 5725 ^[6] the C test may be iterated until no further exceptionally large variance values are detected, but such practice may lead to excessive rejections if the underlying data series are not normally distributed. The C test evaluates the ratio:

C_{j}={\frac {S_{j}^{2}}{\displaystyle \sum _{{i=1}}^{N}S_{i}^{2}}}

where:

C_j = Cochran's C statistic for data series j

S_j = standard deviation of data series j

N = number of data series that remain in the data set; N is decreased in steps of 1 upon each iteration of the C test

S_i = standard deviation of data series i (1 ≤ i ≤ N)

The C test tests the null hypothesis (H₀) against the alternative hypothesis (H_a):

H₀: All variances are equal.

H_a: At least one variance value is significantly larger than the other variance values.

Critical values

The sample variance of data series j is considered an outlier at significance level α if C_j exceeds the upper limit critical value C_UL. C_UL depends on the desired significance level α, the number of considered data series N, and the number of data points (n) per data series. Selections of values for C_UL have been tabulated at significance levels α = 0.01,^[6]^[7]^[8] α = 0.025,^[8] and α = 0.05.^[6]^[7]^[8] C_UL can also be calculated from:^[8]^[9]

C_{{\text{UL}}}(\alpha ,n,N)=\left[1+{\frac {N-1}{F_{{\text{c}}}(\alpha /N,(n-1),(N-1)(n-1))}}\right]^{{-1}}.

Here:

C_UL = upper limit critical value for one-sided test on a balanced design

α = significance level

n = number of data points per data series

F_c = critical value of Fisher's F ratio; F_c can be obtained from tables of the F distribution^[10] or using computer software for this function.

Generalization

The C test can be generalized to include unbalanced designs, one-sided lower limit tests and two-sided tests at any significance level α, for any number of data series N, and for any number of individual data points n_j in data series j.^[8]^[9]

References

↑ W.G. Cochran, The distribution of the largest of a set of estimated variances as a fraction of their total, Annals of Human Genetics (London) 11(1), 47–52 (January 1941).
↑ D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. de Jong, P.J. Lewi, J. Smeyers-Verbeke, Handbook of Chemometrics and Qualimetrics: Part A, Elsevier, Amsterdam, The Netherlands, 1997 ISBN 0-444-89724-0.
1 2 P. Konieczka, J. Namieśnik, Quality Assurance and Quality Control in the Analytical Chemical Laboratory – A Practical Approach, CRC Press, Boca Raton, Florida, 2009; ISBN 978-1-4200-8270-8.
↑ J.K. Taylor, Quality Assurance of Chemical Measurements, 4th printing, Lewis Publishers, Chelsea, Michigan, 1988; ISBN 0-87371-097-5.
↑ W. Horwitz, Harmonized protocol for the design and interpretation of collaborative studies, Trends in Analytical Chemistry 7(4), 118–120 (April 1988).
1 2 3 4 ISO Standard 5725–2:1994, “Accuracy (trueness and precision) of measurement methods and results – Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method”, International Organization for Standardization, Geneva, Switzerland, 1994; http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=11834
1 2 R. Moore, Mathematics Department, Macquarie University, Sydney, Australia, 1999: http://faculty.washington.edu/heagerty/Books/Biostatistics/TABLES/Cochran.
1 2 3 4 5 R.U.E. 't Lam, Scrutiny of variance results for outliers: Cochran's test optimized, Analytica Chimica Acta 659, 68–84 (2010); doi:10.1016/j.aca.2009.11.032
1 2 R.U.E. 't Lam, Variance Outlier Test, blog: http://rtlam.blogspot.com/
↑ Table of critical values of the F-distribution:NIST

External links

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 12/1/2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.