Model Complexity Control in Straight Line Program Genetic Programming

César L. Alonso, José Luis Montaña, Cruz Enrique Borges

2013

Abstract

In this paper we propose a tool for controlling the complexity of Genetic Programming models. The tool is supported by the theory of Vapnik-Chervonekis dimension (VCD) and is combined with a novel representation of models named straight line program. Experimental results, implemented on conventional algebraic structures (such as polynomials) and real problems, show that the empirical risk, penalized by suitable upper bounds for the Vapnik-Chervonenkis dimension, gives a generalization error smaller than the use of statistical conventional techniques such as Bayesian or Akaike information criteria.

References

  1. Akaike, H. (1970). Statistical prediction information. Ann. Inst. Statistic. Math, 22:203-217.
  2. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., Garcia, S., Sánchez, L., and Herrera, F. (2011). Keel datamining sofware tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2-3):255-287.
  3. Alonso, C. L., Montana, J. L., and Puente, J. (2008). Straight line programs: a new linear genetic programming approach. In Proc. 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pages 571-524.
  4. Angluin, D. and Smith, C. (1983). Inductive inference: Theory and methods. ACM Computing Surveys, 15(3):237-269.
  5. Berkowitz, S. (1984). On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, 18:147-150.
  6. Bernadro, J. and Smith, A. (1994). Bayesian Theory. John Willey & Sons.
  7. Burguisser, P., Clausen, M., and Shokrollahi, M. (1997). Algebraic Complexity Theory. Springer.
  8. Gabrielov, A. and Vorobjov, N. (2004). Complexity of computations with pfaffian and noetherian functions. In Normal Forms, Bifurcations and Finiteness Problems in Differential Equations. Kluwer.
  9. Giusti, M., Heintz, J., Morais, J., Morgentern, J., and Pardo, L. (1997). Straight line programs in geometric elimination theory. Journal of Pure and Applied Algebra, 124:121-146.
  10. Giusti, M. and Heinz, J. (1993). La détermination des points isolés et la dimension dúne varieté agebrique peut se faire en temps polynomial. In Computational Algebraic Geometry and Commutative Algebra, Symposia Matematica XXXIV, ed. D. Eisenbud and L. Robbiano, pages 216-256. Cambridge University Press.
  11. Golberg, P. and Jerrum, M. (1995). Bounding the vapnik-chervonenkis dimension of concept classes parametrizes by real numbers. Machine Learning, 18(1):131-148.
  12. Gori, M., Maggini, M., Martinelli, E., and Soda, G. (1998). Inductive inference from noisy examples using the hybrid finite state filter. IEEE Transactions on Neural Networks, 9(3):571-575.
  13. Heintz, J., Roy, M., and Solerno, P. (1990). Sur la complexité du principe de tarski-seidenberg. IBulletin de la Societé Mathematique de France, 118:101-126.
  14. Koza, J. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.
  15. Nikolaev, N. and Iba, H. (2001). Regularization approach to inductive genetic programming. IEEE Transactions on Evolutionary Computation, 5(4):359-375.
  16. Oakley, H. (1994). Two scientific applications of genetic programming: Stack filters and nonlinear fitting to chaotic data. In Advances in Genetic Programming, pages 369-389. Cambridge, MA: MIT Press.
  17. Poli, R. and Cagnoni, S. (1997). Evolution of pseudocoloring algoritms for image enhancement with interactive genetic programming. In J.R. Koza, K.Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba and R.L. Riolo Eds, pages 269-277. Cambridge, MA: MIT Press.
  18. Shaoning, P. and Kasabov, N. (2004). Inductive vs transductive inference, global vs local models: Svm, tsvm and svmt for gene expression classification problems. In Proceedings IEEE International Joint Conference on Neural Networks, pages 1197-1202.
  19. Tackett, W. and Carmi, A. (1994). The donut problem: Scalability and generalization in genetic programming. In Advances in Genetic Programming, pages 143-176. Cambridge, MA: MIT Press.
  20. Tenebaum, J., Griffiths, T., and Kemp, C. (2006). Theory based bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7).
  21. Vapnik, V. (1998). Statistical Learning Theory. John Willey & Sons.
Download


Paper Citation


in Harvard Style

L. Alonso C., Luis Montaña J. and Enrique Borges C. (2013). Model Complexity Control in Straight Line Program Genetic Programming . In Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2013) ISBN 978-989-8565-77-8, pages 25-36. DOI: 10.5220/0004554100250036


in Bibtex Style

@conference{ecta13,
author={César L. Alonso and José Luis Montaña and Cruz Enrique Borges},
title={Model Complexity Control in Straight Line Program Genetic Programming},
booktitle={Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2013)},
year={2013},
pages={25-36},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004554100250036},
isbn={978-989-8565-77-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2013)
TI - Model Complexity Control in Straight Line Program Genetic Programming
SN - 978-989-8565-77-8
AU - L. Alonso C.
AU - Luis Montaña J.
AU - Enrique Borges C.
PY - 2013
SP - 25
EP - 36
DO - 10.5220/0004554100250036