AN OPTIMAL EVALUATION OF GROUPBY-JOIN QUERIES IN DISTRIBUTED ARCHITECTURES

M. Al Hajj Hassan, M. Bamha

2007

Abstract

SQL queries involving join and group-by operations are fairly common in many decision support applications where the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. The most significant drawbacks of the algorithms presented in the literature for treating such queries are that they are very sensitive to data skew and involve expansive communication and Input/Output costs in the evaluation of the join operation. In this paper, we present an algorithm that overcomes these drawbacks because it evaluates the ”GroupBy-Join” query without the need of the direct evaluation of the costly join operation, thus reducing its Input/Output and communication costs. Furthermore, the performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model which predicts a linear speedup even for highly skewed data.

References

  1. Al Hajj Hassan, M. and Bamha, M. (2007). An optimal evaluation of groupby-join queries in distributed architectures. Research Report RR-2007-01, LIFO, Université d'Orléans, France.
  2. Bamha, M. and Hains, G. (2005). An efficient equi-semijoin algorithm for distributed architectures. In Proceedings of the 5th International Conference on Computational Science (ICCS'2005). 22-25 May, Atlanta, USA, LNCS 3515, pages 755-763.
  3. Datta, A., Moon, B., and Thomas, H. (1998). A case for parallelism in datawarehousing and OLAP. In Ninth International Workshop on Database and Expert Systems Applications, DEXA 98, IEEE Computer Society, pages 226-231, Vienna.
  4. Seetha, M. and Yu, P. S. (December 1990). Effectiveness of parallel joins. IEEE, Transactions on Knowledge and Data Enginneerings, 2(4):410-424.
  5. Shatdal, A. and Naughton, J. F. (1995). Adaptive parallel aggregation algorithms. ACM SIGMOD Record, 24(2):104-114.
  6. Taniar, D., Jiang, Y., Liu, K., and Leung, C. (2000). Aggregate-join query processing in parallel database systems. In Proceedings of The Fourth International Conference/Exhibition on High Performance Computing in Asia-Pacific Region HPC-Asia2000, volume 2, pages 824-829. IEEE Computer Society Press.
  7. Valiant, L. G. (August 1990). A bridging model for parallel computation. Communications of the ACM, 33(8):103-111.
  8. Yan, W. P. and Larson, P.- A°. (1994). Performing groupby before join. In Proceedings of the Tenth International Conference on Data Engineering, pages 89- 100, Washington, DC, USA. IEEE Computer Society.
Download


Paper Citation


in Harvard Style

Al Hajj Hassan M. and Bamha M. (2007). AN OPTIMAL EVALUATION OF GROUPBY-JOIN QUERIES IN DISTRIBUTED ARCHITECTURES . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-972-8865-77-1, pages 246-252. DOI: 10.5220/0001281302460252


in Bibtex Style

@conference{webist07,
author={M. Al Hajj Hassan and M. Bamha},
title={AN OPTIMAL EVALUATION OF GROUPBY-JOIN QUERIES IN DISTRIBUTED ARCHITECTURES},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2007},
pages={246-252},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001281302460252},
isbn={978-972-8865-77-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - AN OPTIMAL EVALUATION OF GROUPBY-JOIN QUERIES IN DISTRIBUTED ARCHITECTURES
SN - 978-972-8865-77-1
AU - Al Hajj Hassan M.
AU - Bamha M.
PY - 2007
SP - 246
EP - 252
DO - 10.5220/0001281302460252