Physical Data Warehouse Design on NoSQL Databases - OLAP Query Processing over HBase

Lucas C. Scabora, Jaqueline J. Brito, Ricardo Rodrigues Ciferri, Cristina Dutra de Aguiar Ciferri

2016

Abstract

Nowadays, data warehousing and online analytical processing (OLAP) are core technologies in business intelligence and therefore have drawn much interest by researchers in the last decade. However, these technologies have been mainly developed for relational database systems in centralized environments. In other words, these technologies have not been designed to be applied in scalable systems such as NoSQL databases. Adapting a data warehousing environment to NoSQL databases introduces several advantages, such as scalability and flexibility. This paper investigates three physical data warehouse designs to adapt the Star Schema Benchmark for its use in NoSQL databases. In particular, our main investigation refers to the OLAP query processing over column-oriented databases using the MapReduce framework. We analyze the impact of distributing attributes among column-families in HBase on the OLAP query performance. Our experiments showed how processing time of OLAP queries was impacted by a physical data warehouse design regarding the number of dimensions accessed and the data volume. We conclude that using distinct distributions of attributes among column-families can improve OLAP query performance in HBase and consequently make the benchmark more suitable for OLAP over NoSQL databases.

References

  1. Abadi, D. J., Madden, S. R., and Hachem, N. (2008). Column-stores vs. row-stores: How different are they really? In ACM SIGMOD, pages 967-980, NY, USA.
  2. Bog, A. (2013). Benchmarking Transaction and Analytical Processing Systems: The Creation of a Mixed Workload Benchmark and Its Application. Springer Publishing Company, Incorporated, 1 edition.
  3. Cai, L., Huang, S., Chen, L., and Zheng, Y. (2013). Performance analysis and testing of hbase based on its architecture. In 12th IEEE/ACIS ICIS, pages 353-358.
  4. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., and Tournier, R. (2015). Implementing Multidimensional Data Warehouses into NoSQL. In ICEIS.
  5. Ciferri, C., Ciferri, R., Gómez, L., Schneider, M., Vaisman, A., and Zimányi, E. (2013). Cube algebra: A generic user-centric model and query language for olap cubes. IJDWM, 9(2):39-65.
  6. Dehdouh, K., Bentayeb, F., Boussaid, O., and Kabachi, N. (2015). Using the column oriented NoSQL model for implementing big data warehouses. PDPTA'15, pages 469-475.
  7. Dehdouh, K., Boussaid, O., and Bentayeb, F. (2014). Columnar NoSQL star schema benchmark. In MEDI 2014, pages 281-288.
  8. Doulkeridis, C. and Nørvåg, K. (2014). A survey of largescale analytical query processing in mapreduce. The VLDB Journal, 23(3):355-380.
  9. Floratou, A., Özcan, F., and Schiefer, B. (2014). Benchmarking sql-on-hadoop systems: TPC or not tpc? In 5th WBDB, pages 63-72.
  10. Folkerts, E., Alexandrov, A., Sachs, K., Iosup, A., Markl, V., and Tosun, C. (2012). Benchmarking in the cloud: What it should, can, and cannot be. In 4th TPCTC, pages 173-188.
  11. George, L. (2011). HBase: The Definitive Guide . O'Reilly Media, 1rd edition.
  12. Kimball, R. and Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley Publishing, 3rd edition.
  13. Moussa, R. (2012). Tpc-h benchmark analytics scenarios and performances on hadoop data clouds. In NDT, volume 293, pages 220-234.
  14. O'Neil, P., O'Neil, E., Chen, X., and Revilak, S. (2009). The star schema benchmark and augmented fact table indexing. In TPCTC, pages 237-252.
  15. Poess, M. and Floyd, C. (2000). New TPC benchmarks for decision support and web commerce. SIGMOD Record, 29(4):64-71.
  16. Poess, M., Smith, B., Kollar, L., and Larson, P. (2002). TPC-DS, taking decision support benchmarking to the next level. In SIGMOD Conference, pages 582-587.
  17. Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., and Murthy, R. (2010). Hive - a petabyte scale data warehouse using hadoop. In 26th ICDE, pages 996-1005.
Download


Paper Citation


in Harvard Style

Scabora L., Brito J., Ciferri R. and Ciferri C. (2016). Physical Data Warehouse Design on NoSQL Databases - OLAP Query Processing over HBase . In Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-187-8, pages 111-118. DOI: 10.5220/0005815901110118


in Bibtex Style

@conference{iceis16,
author={Lucas C. Scabora and Jaqueline J. Brito and Ricardo Rodrigues Ciferri and Cristina Dutra de Aguiar Ciferri},
title={Physical Data Warehouse Design on NoSQL Databases - OLAP Query Processing over HBase},
booktitle={Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2016},
pages={111-118},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005815901110118},
isbn={978-989-758-187-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Physical Data Warehouse Design on NoSQL Databases - OLAP Query Processing over HBase
SN - 978-989-758-187-8
AU - Scabora L.
AU - Brito J.
AU - Ciferri R.
AU - Ciferri C.
PY - 2016
SP - 111
EP - 118
DO - 10.5220/0005815901110118