One in ten rule

In statistics, the one in ten rule is a rule of thumb for how many predictors can be derived from data when doing regression analysis (in particular proportional hazards models and logistic regression) without risk of overfitting. The rule states that one predictive variable can be studied for every ten events.[1][2][3][4]

For example, if a sample of 200 patients are studied and 20 patients die during the study, only two pre-specified predictors can reliably be fitted to the total data. Similarly, if 120 patients die during the study (so that 80 patients survive), eight pre-specified predictors (based on the smallest of the two counts, being 80) can be fitted reliably. If more are fitted, overfitting is likely and the results will not predict well outside the training data. It is not uncommon to see the 1:10 rule violated in fields with many variables (e.g. gene expression studies in cancer), decreasing the confidence in reported findings.[5]

The one in ten rule is a minimum; a "one in 20 rule" has been suggested, indicating the need for shrinkage of regression coefficients, and a "one in 50 rule" for stepwise selection with the default p-value of 5%.[4][6]

Recent studies, however suggest that the rule may be too conservative and that five to nine events per predictor can be enough, depending on the research question. [7]

References

  1. Harrell, F. E. Jr.; Lee, K. L.; Califf, R. M.; Pryor, D. B.; Rosati, R. A. (1984). "Regression modelling strategies for improved prognostic prediction". Stat Med. 3 (2): 143–52. doi:10.1002/sim.4780030207.
  2. Harrell, F. E. Jr.; Lee, K. L.; Mark, D. B. (1996). "Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors" (PDF). Stat Med. 15 (4): 361–87. doi:10.1002/(sici)1097-0258(19960229)15:4<361::aid-sim168>3.0.co;2-4.
  3. Peduzzi, Peter; Concato, John; Kemper, Elizabeth; Holford, Theodore R.; Feinstein, Alvan R. (1996). "A simulation study of the number of events per variable in logistic regression analysis". Journal of Clinical Epidemiology. 49 (12): 1373–1379. doi:10.1016/s0895-4356(96)00236-3.
  4. 1 2 Chapter 8: Statistical Models for Prognostication: Problems with Regression Models at the Wayback Machine (archived October 31, 2004)
  5. Ernest S. Shtatland, Ken Kleinman, Emily M. Cain. Model building in Proc PHREG with automatic variable selection and information criteria. Paper 206–30 in SUGI 30 Proceedings, Philadelphia, Pennsylvania April 10–13, 2005. http://www2.sas.com/proceedings/sugi30/206-30.pdf
  6. Steyerberg, E. W.; Eijkemans, M. J.; Harrell, F. E. Jr.; Habbema, J. D. (2000). "Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets". Stat Med. 19: 1059–1079. doi:10.1002/(sici)1097-0258(20000430)19:8<1059::aid-sim412>3.0.co;2-0.
  7. http://aje.oxfordjournals.org/content/165/6/710.full
This article is issued from Wikipedia - version of the 11/18/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.