loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Manuel Weißbach ; Hannes Hilbert and Thomas Springer

Affiliation: Faculty of Computer Science, Technische Universität Dresden, Germany

Keyword(s): Stream Processing, Benchmarking, Database Benchmark, Big Data, Performance.

Abstract: Big data applications must process increasingly large amounts of data within ever shorter time. Often a stream processing engine (SPE) is used to process incoming data with minimal latency. While these engines are designed to process data quickly, they are not made to persist and manage it. Thus, databases are still integrated into streaming architectures, which often becomes a performance bottleneck. To overcome this issue and achieve maximum performance, all system components used must be examined in terms of their throughput and latency, and how well they interact with each other. Several authors have already analyzed the performance of popular distributed database systems. While doing so, we focus on the interaction between the SPEs and the databases, as we assume that stream processing leads to changes in the access patterns to the databases. Moreover, our main focus is on the efficient storing and loading of binary data objects rather than typed data, since in our use cases the actual data analysis is not to be performed by the database, but by the SPE. We’ve benchmarked common databases within streaming environments to determine which software combination is best suited for these requirements. Our results show that the database performance differs significantly depending on the access pattern used and that different software combinations lead to substantial performance differences. Depending on the access pattern, Cassandra, MongoDB and PostgreSQL achieved the best throughputs, which were mostly the highest when Apache Flink was used. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.80.131.164

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Weißbach, M.; Hilbert, H. and Springer, T. (2020). Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments. In Proceedings of the 10th International Conference on Cloud Computing and Services Science - CLOSER; ISBN 978-989-758-424-4; ISSN 2184-5042, SciTePress, pages 138-149. DOI: 10.5220/0009413301380149

@conference{closer20,
author={Manuel Weißbach. and Hannes Hilbert. and Thomas Springer.},
title={Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments},
booktitle={Proceedings of the 10th International Conference on Cloud Computing and Services Science - CLOSER},
year={2020},
pages={138-149},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009413301380149},
isbn={978-989-758-424-4},
issn={2184-5042},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Cloud Computing and Services Science - CLOSER
TI - Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments
SN - 978-989-758-424-4
IS - 2184-5042
AU - Weißbach, M.
AU - Hilbert, H.
AU - Springer, T.
PY - 2020
SP - 138
EP - 149
DO - 10.5220/0009413301380149
PB - SciTePress