A RELATIONSHIP BETWEEN CROSS-VALIDATION AND VAPNIK BOUNDS ON GENERALIZATION OF LEARNING MACHINES

Przemysław Klęsk

Abstract

Typically, the n-fold cross-validation is used both to: (1) estimate the generalization properties of a model of fixed complexity, (2) choose from a family of models of different complexities, the one with the best complexity, given a data set of certain size. Obviously, it is a time-consuming procedure. A different approach — the Structural Risk Minimization is based on generalization bounds of learning machines given by Vapnik (Vapnik, 1995a; Vapnik, 1995b). Roughly speaking, SRM is O(n) times faster than n-fold cross-validation but less accurate. We state and prove theorems, which show the probabilistic relationship between the two approaches. In particular, we show what e-difference between the two, one may expect without actually performing the crossvalidation. We conclude the paper with results of experiments confronting the probabilistic bounds we derived.

References

  1. Anthony, M. and Shawe-Taylor, J. (1993). A result of vapnik with applications. Discrete Applied Mathematics, 47(3):207-217.
  2. Bartlett, P. (1997). The sample complexity of pattern classification with neural networks: the size of weights is more important then the size of the network. IEEE Transactions on Information Theory, 44(2).
  3. Bartlett, P., Kulkarni, S., and Posner, S. (1997). Covering numbers for real-valued function classes. IEEE Transactions on Information Theory, 47:1721-1724.
  4. Cherkassky, V. and Mulier, F. (1998). Learning from data. John Wiley & Sons, inc.
  5. Devroye, L., Gyorfi, L., and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer Verlag, New York, inc.
  6. Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. London: Chapman & Hall.
  7. Fu, W., Caroll, R., and Wang, S. (2005). Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics, 21(9):1979-1986.
  8. Hellman, M. and Raviv, J. (1970). Probability of error, equivocation and the chernoff bound. IEEE Transactions on Information Theory, IT-16(4):368-372.
  9. Hjorth, J. (1994). Computer Intensive Statistical Methods Validation, Model Selection, and Bootstrap. London: Chapman & Hall.
  10. Holden, S. (1996a). Cross-validation and the pac learning model. Technical Report RN/96/64, Dept. of CS, University College, London.
  11. Holden, S. (1996b). Pac-like upper bounds for the sample complexity of leave-one-out cross-validation. In 9-th Annual ACM Workshop on Computational Learning Theory, pages 41-50.
  12. Kearns, M. (1995a). A bound on the error of crossvalidation, with consequences for the training-test split. In Advances in Neural Information Processing Systems 8. MIT Press.
  13. Kearns, M. (1995b). An experimental and theoretical comparison of model selection methods. In 8-th Annual ACM Workshop on Computational Learning Theory, pages 21-30.
  14. Kearns, M. and Ron, D. (1999). Algorithmic stability and sanity-check bounds for leave-one-out crossvalidation. Neural Computation, 11:1427-1453.
  15. Kohavi, R. (1995). A study of cross-validation and boostrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence (IJCAI).
  16. Krzyz?ak, A. et al. (2000). Application of structural risk minimization to multivariate smoothing spline regression estimates. Bernoulli, 8(4):475-489.
  17. M. KorzeÁ, M. and Kle?sk, P. (2008). Maximal margin estimation with perceptron-like algorithm. In L. Rutkowski, R. Tadeusiewicz R., L. Z. J. Z., editor, Lecture Notes in Artificial Intelligence, pages 597- 608. Springer.
  18. Ng, A. (2004). Feature selection, l1 vs. l2 regularization, and rotational invariance. In 21-st International Conference on Machine learning, ACM International Conference Proceeding Series, volume 69.
  19. Schmidt, J., Siegel, A., and Srinivasan, A. (1995). Chernoff-hoeffding bounds for applications with limited independence. SIAM Journal on Discrete Mathematics, 8(2):223-250.
  20. Shawe-Taylor, J. et al. (1996). A framework for structural risk minimization. COLT, pages 68-76.
  21. Vapnik, V. (1995a). The Nature of Statistical Learning Theory. Springer Verlag, New York.
  22. Vapnik, V. (1995b). Statistical Learning Theory: Inference from Small Samples. Wiley, New York.
  23. Vapnik, V. (2006). Estimation of Dependences Based on Empirical Data. Information Science & Statistics. Springer, US.
  24. Vapnik, V. and Chervonenkis, A. (1968). On the uniform convergence of relative frequencies of events to their probabilities. Dokl. Aka. Nauk, 181.
  25. Vapnik, V. and Chervonenkis, A. (1989). The necessary and sufficient conditions for the consistency of the method of empirical risk minimization. Yearbook of the Academy of Sciences of the USSR on Recognition, Classification and Forecasting, 2:217-249.
  26. Weiss, S. and Kulikowski, C. (1991). Computer Systems That Learn. Morgan Kaufmann.
Download


Paper Citation


in Harvard Style

Klęsk P. (2011). A RELATIONSHIP BETWEEN CROSS-VALIDATION AND VAPNIK BOUNDS ON GENERALIZATION OF LEARNING MACHINES . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 5-17. DOI: 10.5220/0003121000050017


in Bibtex Style

@conference{icaart11,
author={Przemysław Klęsk},
title={A RELATIONSHIP BETWEEN CROSS-VALIDATION AND VAPNIK BOUNDS ON GENERALIZATION OF LEARNING MACHINES},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={5-17},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003121000050017},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - A RELATIONSHIP BETWEEN CROSS-VALIDATION AND VAPNIK BOUNDS ON GENERALIZATION OF LEARNING MACHINES
SN - 978-989-8425-40-9
AU - Klęsk P.
PY - 2011
SP - 5
EP - 17
DO - 10.5220/0003121000050017