Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments

Manuel Weißbach, Hannes Hilbert, Thomas Springer

2020

Abstract

Big data applications must process increasingly large amounts of data within ever shorter time. Often a stream processing engine (SPE) is used to process incoming data with minimal latency. While these engines are designed to process data quickly, they are not made to persist and manage it. Thus, databases are still integrated into streaming architectures, which often becomes a performance bottleneck. To overcome this issue and achieve maximum performance, all system components used must be examined in terms of their throughput and latency, and how well they interact with each other. Several authors have already analyzed the performance of popular distributed database systems. While doing so, we focus on the interaction between the SPEs and the databases, as we assume that stream processing leads to changes in the access patterns to the databases. Moreover, our main focus is on the efficient storing and loading of binary data objects rather than typed data, since in our use cases the actual data analysis is not to be performed by the database, but by the SPE. We’ve benchmarked common databases within streaming environments to determine which software combination is best suited for these requirements. Our results show that the database performance differs significantly depending on the access pattern used and that different software combinations lead to substantial performance differences. Depending on the access pattern, Cassandra, MongoDB and PostgreSQL achieved the best throughputs, which were mostly the highest when Apache Flink was used.

Download


Paper Citation


in Harvard Style

Weißbach M., Hilbert H. and Springer T. (2020). Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments.In Proceedings of the 10th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-424-4, pages 138-149. DOI: 10.5220/0009413301380149


in Bibtex Style

@conference{closer20,
author={Manuel Weißbach and Hannes Hilbert and Thomas Springer},
title={Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments},
booktitle={Proceedings of the 10th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2020},
pages={138-149},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009413301380149},
isbn={978-989-758-424-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Performance Analysis of Continuous Binary Data Processing using Distributed Databases within Stream Processing Environments
SN - 978-989-758-424-4
AU - Weißbach M.
AU - Hilbert H.
AU - Springer T.
PY - 2020
SP - 138
EP - 149
DO - 10.5220/0009413301380149