Performance of Databases Used in Data Stream Processing Environments

Manuel Weißbach, Thomas Springer

2022

Abstract

Data stream processing (DSP) is being used in more and more fields to process large amounts of data with minimal latency and high throughput. In typical setups, a stream processing engine is combined with additional components, especially database systems to implement complex use cases, which might cause a significant decrease of processing performance. In this paper we examine the specific data access patterns caused by data stream processing and benchmark database systems with typical use cases derived from a real-world application. Our tests involve popular databases in combination with Apache Flink to identify the system combinations with the highest processing performance. Our results show that the choice of a database is highly dependent on the data access pattern of the particular use case. In one of our benchmarks, we found a throughput difference of a factor of 46.2 between the best and the worst performing database. From our experience in implementing a complex real-world application, we have derived a set of performance optimization recommendations to help system developers to select an appropriate database for their use case and to find a high-performing system configuration.

Download


Paper Citation


in Harvard Style

Weißbach M. and Springer T. (2022). Performance of Databases Used in Data Stream Processing Environments. In Proceedings of the 12th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-570-8, pages 15-26. DOI: 10.5220/0011018300003200


in Bibtex Style

@conference{closer22,
author={Manuel Weißbach and Thomas Springer},
title={Performance of Databases Used in Data Stream Processing Environments},
booktitle={Proceedings of the 12th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2022},
pages={15-26},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011018300003200},
isbn={978-989-758-570-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Performance of Databases Used in Data Stream Processing Environments
SN - 978-989-758-570-8
AU - Weißbach M.
AU - Springer T.
PY - 2022
SP - 15
EP - 26
DO - 10.5220/0011018300003200