Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters

Marin Fotache; Marius-Iulian Cluci; Valerică Greavu-Şerban

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters

Topics: Data Management for Large Data; Modeling, Experiments, Sharing Technologies & Platforms; Software Frameworks (MapReduce, Spark etc) and Simulations

In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security IoTBDS - Volume 1, 327-334, 2020

Authors: Marin Fotache ; Marius-Iulian Cluci and Valerică Greavu-Şerban

Affiliation: Al. I. Cuza University of Iasi, Romania

Keyword(s): Big Data, Beowulf Clusters, Apache Spark, Spark SQL, Machine Learning, Distributed Computing, TCP-H.

Abstract: With distributed computing platforms deployed on affordable hardware, Big Data technologies have democratised the processing of huge volumes of structured and semi-structured data. Still, the costs of installing and operating even relatively small cluster of commodity servers or the cost of hiring cloud resources could prove inaccessible for many companies and institutions. This paper builds two predictive models for estimating the main drivers of the data processing performance for one of the most popular Big Data system (Apache Spark) deployed on gradually increased number of nodes of a Beowulf cluster. Data processing performance was estimated by randomly generated SparkSQL queries on TPC-H database schema, with variable number of joins (including self-joins), predicates, groups, aggregate functions and subqueries included in FROM clause. Using two machine learning techniques, random forest and extreme gradient boosting, predictive models tried to estimate the query duration on pr edictors related to cluster setup and query structure and also to assess the importance of predictors for the outcome variability. Results were positive and encouraging for extending the cluster number of nodes and the database scale. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.147.103.202

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Fotache, M.; Cluci, M. and Greavu-Şerban, V. (2020). Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters. In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - IoTBDS; ISBN 978-989-758-426-8; ISSN 2184-4976, SciTePress, pages 327-334. DOI: 10.5220/0009407903270334

@conference{iotbds20,
author={Marin Fotache. and Marius{-}Iulian Cluci. and Valerică Greavu{-}Şerban.},
title={Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters},
booktitle={Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - IoTBDS},
year={2020},
pages={327-334},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009407903270334},
isbn={978-989-758-426-8},
issn={2184-4976},
}

TY - CONF

JO - Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - IoTBDS
TI - Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters
SN - 978-989-758-426-8
IS - 2184-4976
AU - Fotache, M.
AU - Cluci, M.
AU - Greavu-Şerban, V.
PY - 2020
SP - 327
EP - 334
DO - 10.5220/0009407903270334
PB - SciTePress