Performance of Databases Used in Data Stream Processing Environments
Manuel Weißbach, Thomas Springer
2022
Abstract
Data stream processing (DSP) is being used in more and more fields to process large amounts of data with minimal latency and high throughput. In typical setups, a stream processing engine is combined with additional components, especially database systems to implement complex use cases, which might cause a significant decrease of processing performance. In this paper we examine the specific data access patterns caused by data stream processing and benchmark database systems with typical use cases derived from a real-world application. Our tests involve popular databases in combination with Apache Flink to identify the system combinations with the highest processing performance. Our results show that the choice of a database is highly dependent on the data access pattern of the particular use case. In one of our benchmarks, we found a throughput difference of a factor of 46.2 between the best and the worst performing database. From our experience in implementing a complex real-world application, we have derived a set of performance optimization recommendations to help system developers to select an appropriate database for their use case and to find a high-performing system configuration.
DownloadPaper Citation
in Harvard Style
Weißbach M. and Springer T. (2022). Performance of Databases Used in Data Stream Processing Environments. In Proceedings of the 12th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-570-8, pages 15-26. DOI: 10.5220/0011018300003200
in Bibtex Style
@conference{closer22,
author={Manuel Weißbach and Thomas Springer},
title={Performance of Databases Used in Data Stream Processing Environments},
booktitle={Proceedings of the 12th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2022},
pages={15-26},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011018300003200},
isbn={978-989-758-570-8},
}
in EndNote Style
TY  - CONF 
JO  - Proceedings of the 12th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI  - Performance of Databases Used in Data Stream Processing Environments
SN  - 978-989-758-570-8
AU  - Weißbach M. 
AU  - Springer T. 
PY  - 2022
SP  - 15
EP  - 26
DO  - 10.5220/0011018300003200