A Platform for Interactive Data Science with Apache Spark for On-premises Infrastructure

Rafal Lokuciejewski; Dominik Schüssele; Florian Wilhelm; Sven Groppe

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

A Platform for Interactive Data Science with Apache Spark for On-premises Infrastructure

Topics: Cloud Application Architectures; Cloud Management Platforms; Function-as-a-Service and Serverless Computing ; High Performance Computing Cloud Applications ; Use Cases , Experiences with HPC Clouds; Service Platforms

In Proceedings of the 11th International Conference on Cloud Computing and Services Science CLOSER - Volume 1, 65-76, 2021

Authors: Rafal Lokuciejewski ¹ ; Dominik Schüssele ¹ ; Florian Wilhelm ¹ and Sven Groppe ²

Affiliations: ¹ inovex GmbH, Germany ; ² Institute of Information Systems (IFIS), University of Lübeck, Germany

Keyword(s): Jupyter, Spark, Kubernetes, YARN, Cluster, UEQ, Notebooks.

Abstract: Various cloud providers offer integrated platforms for interactive development in notebooks for processing and analysis of Big Data on large compute clusters. Such platforms enable users to easily leverage frameworks like Apache Spark as well as to manage cluster resources. However, Data Scientists and Engineers are facing the lack of a similar holistic solution when working with on-premises infrastructure. Especially a central point of administration to access a notebooks’ UI, manage notebook kernels, allocate resources for frameworks like Apache Spark or monitor cluster workloads, in general, is currently missing for on-premises infrastructure. To overcome these issues and provide on-premises users with a platform for interactive development, we propose a cross-cluster architecture resulting from an extensive requirements engineering process. Based on open-source components, the designed platform provides an intuitive Web-UI that enables users to easily access notebooks, manage cus tom kernel-environments as well as monitor cluster resources and current workloads. Besides an admin panel for user restrictions, the platform provides isolation of user workloads and scalability by design. The designed platform is evaluated against prior solutions for on-premises as well as from a user perspective by utilizing the User Experience Questionnaire, an independent benchmark tool for interactive products. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.12

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Lokuciejewski, R., Schüssele, D., Wilhelm, F., Groppe and S. (2021). A Platform for Interactive Data Science with Apache Spark for On-premises Infrastructure. In Proceedings of the 11th International Conference on Cloud Computing and Services Science - CLOSER; ISBN 978-989-758-510-4; ISSN 2184-5042, SciTePress, pages 65-76. DOI: 10.5220/0010447500650076

@conference{closer21,
author={Rafal Lokuciejewski and Dominik Schüssele and Florian Wilhelm and Sven Groppe},
title={A Platform for Interactive Data Science with Apache Spark for On-premises Infrastructure},
booktitle={Proceedings of the 11th International Conference on Cloud Computing and Services Science - CLOSER},
year={2021},
pages={65-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010447500650076},
isbn={978-989-758-510-4},
issn={2184-5042},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Cloud Computing and Services Science - CLOSER
TI - A Platform for Interactive Data Science with Apache Spark for On-premises Infrastructure
SN - 978-989-758-510-4
IS - 2184-5042
AU - Lokuciejewski, R.
AU - Schüssele, D.
AU - Wilhelm, F.
AU - Groppe, S.
PY - 2021
SP - 65
EP - 76
DO - 10.5220/0010447500650076
PB - SciTePress