Predicting SQL Query Execution Time with a Cost Model for Spark Platform

Aleksey Burdakov, Viktoria Proletarskaya, Andrey Ploutenko, Oleg Ermakov, Uriy Grigorev

2020

Abstract

The paper proposes a cost model for predicting query execution time in a distributed parallel system requiring time estimation. The estimation is paramount for running a DaaS environment or building an optimal query execution plan. It represents a SQL query with nested stars. Each star includes dimension tables, a fact table, and a Bloom filter. Bloom filters can substantially reduce network traffic for the Shuffle phase and cut join time for the Reduce stage of query execution in Spark. We propose an algorithm for generating a query implementation program. The developed model was calibrated and its adequacy evaluated (50 points). The obtained coefficient of determination R2=0.966 demonstrates a good model accuracy even with non-precise intermediate table cardinalities. 77% of points for the modelling time over 10 seconds have modelling error Δ<30%. Theoretical model evaluation supports the modelling and experimental results for large databases.

Download


Paper Citation


in Harvard Style

Burdakov A., Proletarskaya V., Ploutenko A., Ermakov O. and Grigorev U. (2020). Predicting SQL Query Execution Time with a Cost Model for Spark Platform.In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-426-8, pages 279-287. DOI: 10.5220/0009396202790287


in Bibtex Style

@conference{iotbds20,
author={Aleksey Burdakov and Viktoria Proletarskaya and Andrey Ploutenko and Oleg Ermakov and Uriy Grigorev},
title={Predicting SQL Query Execution Time with a Cost Model for Spark Platform},
booktitle={Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2020},
pages={279-287},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009396202790287},
isbn={978-989-758-426-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Predicting SQL Query Execution Time with a Cost Model for Spark Platform
SN - 978-989-758-426-8
AU - Burdakov A.
AU - Proletarskaya V.
AU - Ploutenko A.
AU - Ermakov O.
AU - Grigorev U.
PY - 2020
SP - 279
EP - 287
DO - 10.5220/0009396202790287