SEAR - Scalable, Efficient, Accurate, Robust kNN-based Regression

Aditya Desai, Himanshu Singh, Vikram Pudi

2010

Abstract

Regression algorithms are used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships. Statistical approaches although popular, are not generic in that they require the user to make an intelligent guess about the form of the regression equation. In this paper we present a new regression algorithm SEAR – Scalable, Efficient, Accurate kNN-based Regression. In addition to this, SEAR is simple and outlier-resilient. These desirable features make SEAR a very attractive alternative to existing approaches. Our experimental study compares SEAR with fourteen other algorithms on five standard real datasets, and shows that SEAR is more accurate than all its competitors.

References

  1. I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2 edition, 2005.
  2. Y. Wang and I. H. Witten, Modeling for optimal probability prediction, 2002.
  3. Lingjaerde, Ole C. and Liestøl, Knut, Generalized projection pursuit regression, SIAM Journal on Scientific Computing, 1999.
  4. L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and regression trees, Wadsworth Inc., 1984.
  5. Haykin, Simon, Self-organizing maps, Neural networks - A comprehensive foundation (2nd edition ed.). PrenticeHall.
  6. W. Chu and S.S. Keerthi, New approaches to support vector ordinal regression, Proc. of International Conference on Machine Learning(ICML-05), pages 142- 152, 2005.
  7. L. Breiman, Bagging Predictors, Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
  8. R.E. Schapire, A Brief Introduction to Boosting, Proc. 16th International Joint Conf. Artificial Intelligence pp. 1401-1406, 1999.
  9. E. Fix and J. L. H. Jr. Discriminatory analysis, nonparameteric discrimination: Consistency properties, Technical Report 21-49-004(4), USAF school of aviation medicine, Randolf field, Texas, 1951.
  10. P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection, Wiley, 1987.
  11. Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. AddisonWesley, 1997. ISBN 0-201-89685-0. Section 6.2.1: Searching an Ordered Table, pp. 409-426.
  12. A. Asuncion and D. Newman. UCI Machine learning repository, 2007.
  13. Yeh, I-Cheng, Modeling slump flow of concrete using second-order regressions and artificial neural networks, Cement and Concrete Composites, Vol.29, No. 6, 474-480, 2007.
  14. I.-C. Yeh. Modeling of strength of high performance concrete using artificial neural networks, Cement and Concrete Research, 28, No. 12:1797-1808, 1998.
  15. Todorovski, L. (1998) Declarative bias in equation discovery. M.Sc. Thesis. Faculty of Computer and Information Science, Ljubljana, Slovenia.
  16. Dzeroski, S. and Todorovski, L. (1995) Discovering dynamics: from inductive logic programming to machine discovery. Journal of Intelligent Information Systems, 4: 89-108.
  17. Desai A., Singh H., Vikram Pudi, October 2009. Technical Report PAGER: Parameterless, Accurate, Generic, Efficient kNN-based Regression. http://web2py.iiit.ac.in/publications/default/ view publication/ techreport/59
Download


Paper Citation


in Harvard Style

Desai A., Singh H. and Pudi V. (2010). SEAR - Scalable, Efficient, Accurate, Robust kNN-based Regression . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 392-395. DOI: 10.5220/0003068703920395


in Bibtex Style

@conference{kdir10,
author={Aditya Desai and Himanshu Singh and Vikram Pudi},
title={SEAR - Scalable, Efficient, Accurate, Robust kNN-based Regression},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={392-395},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003068703920395},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - SEAR - Scalable, Efficient, Accurate, Robust kNN-based Regression
SN - 978-989-8425-28-7
AU - Desai A.
AU - Singh H.
AU - Pudi V.
PY - 2010
SP - 392
EP - 395
DO - 10.5220/0003068703920395