Advanced Analytics with the SAP HANA Database

Philipp Große, Wolfgang Lehner, Norman May

2013

Abstract

Complex database applications require complex custom logic to be executed in the database kernel. Traditional relational databases lack an easy to-use programming model to implement and tune such user defined code, which motivates developers to use MapReduce instead of traditional database systems. In this paper we discuss four processing patterns in the context of the distributed SAP HANA database that even go beyond the classic MapReduce paradigm. We illustrate them using some typical Machine Learning algorithms and present experimental results that demonstrate how the data flows scale out with the number of parallel tasks.

References

  1. A. P. Dempster, N. M. Laird, D. B. R. (2008). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1):1-38.
  2. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., and Rasin, A. (2009). HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922-933.
  3. Alexandrov, A., Battré, D., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., and Warneke, D. (2010). Massively Parallel Data Analysis with PACTs on Nephele. PVLDB, 3(2):1625-1628.
  4. Apache Mahout (2013). http://mahout.apache.org/.
  5. Bu, Y., Howe, B., Balazinska, M., and Ernst, M. D. (2010). HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB, 3(1):285-296.
  6. Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G. R., Ng, A. Y., and Olukotun, K. (2006). Map-Reduce for Machine Learning on Multicore. In NIPS, pages 281- 288.
  7. Dean, J. and Ghemawat, S. (2004). MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137-150.
  8. Dittrich, J., Quiané-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., and Schad, J. (2010). Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow., 3(1-2):515-529.
  9. Gillick, D., Faria, A., and Denero, J. (2006). MapReduce: Distributed Computing for Machine Learning.
  10. Große, P., Lehner, W., Weichert, T., Färber, F., and Li, W.- S. (2011). Bridging Two Worlds with RICE Integrating R into the SAP In-Memory Computing Engine. PVLDB, 4(12):1307-1317.
  11. Kaldewey, T., Shekita, E. J., and Tata, S. (2012). Clydesdale: structured data processing on MapReduce. In Proc. Extending Database Technology, EDBT 7812, pages 15-25, New York, NY, USA. ACM.
  12. Poldner, M. and Kuchen, H. (2005). On implementing the farm skeleton. In Proc. Workshop HLPP 2005.
  13. R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3- 900051-07-0.
  14. Sikka, V., Färber, F., Lehner, W., Cha, S. K., Peh, T., and Bornhövd, C. (2012). Efficient transaction processing in SAP HANA database: the end of a column store myth. In Proc. SIGMOD, SIGMOD 7812, pages 731- 742, New York, NY, USA. ACM.
  15. Su, X. and Swart, G. (2012). Oracle in-database Hadoop: when MapReduce meets RDBMS. In Proc. SIGMOD, SIGMOD 7812, pages 779-790, New York, NY, USA. ACM.
  16. The Canadian Hansard Corpus (2001). http://www.isi.edu/ natural-language/download/hansard.
  17. Yang, H.-c., Dasdan, A., Hsiao, R.-L., and Parker, D. S. (2007). Map-Reduce-Merge: simplified relational data processing on large clusters. In Proc. SIGMOD, SIGMOD 7807, pages 1029-1040, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Große P., Lehner W. and May N. (2013). Advanced Analytics with the SAP HANA Database . In Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-67-9, pages 61-71. DOI: 10.5220/0004430800610071


in Bibtex Style

@conference{data13,
author={Philipp Große and Wolfgang Lehner and Norman May},
title={Advanced Analytics with the SAP HANA Database},
booktitle={Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,},
year={2013},
pages={61-71},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004430800610071},
isbn={978-989-8565-67-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - Advanced Analytics with the SAP HANA Database
SN - 978-989-8565-67-9
AU - Große P.
AU - Lehner W.
AU - May N.
PY - 2013
SP - 61
EP - 71
DO - 10.5220/0004430800610071