Authors:
Fred Ferreira
and
Robson do Nascimento Fidalgo
Affiliation:
Center of Informatics (CIn), Federal University of Pernambuco (UFPE), Recife, PE, Brazil
Keyword(s):
Data Warehouse, Distributed SQL, NewSQL, HTAP Databases, Data Modeling, Performance Analysis.
Abstract:
Data Warehouses (DWs) have become an indispensable asset for companies to support strategic decision-making. In a world where enterprise data grows exponentially, however, new DW architectures are being investigated to overcome the deficiencies of traditional relational Database Management Systems (DBMS), driving a shift towards more modern, cloud-based DW solutions. To enhance efficiency and ease of use, the industry has seen the rise of next-generation analytics DBMSs, such as NewSQL, a hybrid storage class of solutions that support both complex analytical queries (OLAP) and transactional queries (OLTP). We under-stand that few studies explore whether the way the data is denormalized has an impact on the performance of these solutions to process OLAP queries in a distributed environment. This paper investigates the role of data modeling in the processing time and data volume of a distributed DW. The Star Schema Benchmark was used to evaluate the performance of a Star Schema and a F
ully Denormalized Schema in three different market solutions: Singlestore, Amazon Redshift and MariaDB Columnstore in two different memory availability scenarios. Our results show that data denormalization is not a guarantee for improved performance, as solutions performed very differently depending on the schema. Furthermore, we also show that a hybrid-storage (HTAP) NewSQL solution can outperform an OLAP solution in terms of mean execution time.
(More)