Is Amazon Kinesis Data Analytics Suitable as Core for an Event

Processing Network Model?

Arne Koschel

1 a

, Irina Astrova

, Anna Pakosch

, Christian Gerner

, Christin Schulze

and Matthias Tyca

Hochschule Hannover, DataH, University of Applied Sciences and Arts, Hannover, Germany

Department of Software Science, School of IT, Tallinn University of Technology, Tallinn, Estonia

Keywords:

Event Processing Network (EPN), Event Processing Network Model, Amazon Kinesis Data Analytics.

Abstract:

This article looks at a proposed list of generalized requirements for a uniﬁed modelling of event processing

networks (EPNs) and its application to Amazon Kinesis Data Analytics. It enhances our previous work in

this area, in which we recently analyzed Apache Storm and earlier also the EPiA model, the BEMN model,

and the RuleCore model. Our proposed EPN requirements look at both: The logical model of EPNs and

the concrete technical implementation of them. Therefore, our article provides requirements for EPN models

based on attributes derived from event processing in general as well as existing models. Moreover, as its core

contribution, our article applies those requirements by an in depth analysis of Amazon Kinesis Data Analytics

as a concrete implementation foundation of an EPN model.

1 INTRODUCTION

Intelligent data management and processing has

changed: Collecting large amounts of data from var-

ious sources happens in every company today, called

’Big Data’. It’s not longer sufﬁcient to store data in

relational DBs, log ﬁles or events separately. Infor-

mation from data combined from different sources, is

important for the competitiveness of enterprises.

Batch Processing (Shaikh, 2019) is an established

approach for processing ’Big Data’. At its core, data

is collected and processed ’in batches’. Therefore,

data is collected for a certain period of time before be-

ing processed. The drawback is, that no real-time pro-

cessing is possible. First, data is collected for some

time before processing takes place.

Recently, (Event) Stream Processing (Shaikh,

2019) joined the ﬁeld. An approach to process data

directly after generation. Through this near real-time

processing, an action can happen immediately after

processing. Enterprises can react faster to changes.

For the implementation of Stream Processing, a

modeling technique called (complex) EPNs found its

way into practice. This approach gives a guideline, in-

cluded components and also requirements, how such

a Stream Processing should occur.

https://orcid.org/0000-0001-5695-2893

Along with the rise of Stream Processing, tools

were developed to model and implement EPNs.

Therefore, we contribute here an evaluation of differ-

ent recent tools, which support EPN realization as au-

tomatically as possible. The evaluation also addresses

the following questions: What are EPns and which re-

quirements are important to provide them? A detailed

look is taken at Apache Storm, Amazon Kinesis Data

Analytics and Microsoft Azure Stream Analytics.

In this article, which enhances our work from

(Schulze et al., 2023), we provide the following con-

tributions: First, we brieﬂy look at our – compared

to our work from (Koschel et al., 2017) – more for-

mally structured list of generalized EPN model re-

quirements (as shown in (Schulze et al., 2023) in de-

tail). Second, we provide – as the key contribution of

the present article – an in-depth evaluation of Ama-

zon Kinesis Data (AKD) Analytics with respect to our

EPN model requirements. Moreover, we also brieﬂy

compare AKD to Microsoft Azure Stream Analytics.

Future work of ours will provide an in-depth eval-

uation of Microsoft Azure Stream Analytics regarding

our EPN model requirements as well.

The remainder of this article is structured as fol-

lows: After discussing related work in Section 2, we

place a brief introduction to the topic and provide our

EPN model requirements in Section 3. Next, we take

an in-depth look at Amazon Kinesis Data Analytics

1036

Koschel, A., Astrova, I., Pakosch, A., Gerner, C., Schulze, C. and Tyca, M.

Is Amazon Kinesis Data Analytics Suitable as Core for an Event Processing Network Model?.

DOI: 10.5220/0012432800003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 1036-1043

ISBN: 978-989-758-680-4; ISSN: 2184-433X

in Section 4 followed by a brief comparison to Mi-

crosoft Azure Stream Analytics in Section 5. Eventu-

ally, Section 6 summarizes our results and concludes.

2 RELATED WORK

The basis of our project builds on authors in the

scope of EPN and Complex Event Processing (CEP),

for example, Dunkel and Bruns (Dunkel and Bruns,

2015) and (Bruns and Dunkel, 2010). We also used

foundations from our earlier work on EPNs, namely

(Koschel et al., 2017), (Koschel et al., 2018) and (As-

trova et al., 2019). There, we more informaly es-

tablish the requirements for EPNs and apply them to

different EPN modeling approaches and tools (EPiA,

BEMN, RuleCore). With the present paper, we ex-

tend our work with slightly reﬁned and more formally

structured requirements as well as a deep look at more

recent tools, here in particular at Apache Storm.

Compared to our earlier work, we here cast the

requirements into a template from (Rupp and Pohl,

2021), that means, we somewhat formalize them. We

use a template to deﬁne the requirements in a stan-

dardized form and to show their importance. To en-

sure the quality of the requirements, we validated

them against the quality criteria from (IEE, 1998).

Furthermore, we use the IEEE830-1998 stan-

dard (IEE, 1998) for quality criteria for require-

ments. There exists a newer standard, the

IEEE/ISO/IEC29148-2011, which describes the qual-

ity criteria from IEEE830-1998 in more summary

form. We still meet all the quality criteria from both

versions, except for the singularity. That one is new

in IEEE/ISO/IEC29148-2011.

For the description and evaluation of the different

tools, we used the documentation of the publishers.

Apache offers large documentation in (Str, 2021a) for

Apache Storm, Amazon Web Services provides in-

formation in (Str, 2021b) for Amazon Kinesis Data

Analytics and Microsoft introduces Microsoft Azure

Stream Analytics in (Microsoft, 2021).

We distinguish ourselves from other publications

by standardizing and validating the requirements of

EPNs and by evaluating various tools with different

open or closed source characteristic, effort and costs.

With this variety, we aim to give an overview of differ-

ent tools and support the decision for a suitable tool.

3 EPNs AND REQUIREMENTS

This section curtly presents the basics of EPNs and

the requirements for this kind of systems.

An event is ’a signiﬁcant change of state’ (Luck-

ham, 2002). Event Processing Networks (EPNs) can

be seen as generalized software systems that allow for

the processing of events. However, EPN models lack

standardization, which is where our work aims to con-

tribute.

3.1 Basics of Event Processing

Networks

EPNs are built on the basis of the Event-Driven Ar-

chitectures (EDA) and Complex Event Processing

(CEP). These both approaches

’[...] represent a new style of enterprise ap-

plications that places events at the center of

the software architecture - event orientation as

an architectural style.’ — Dunkel and Bruns

(Bruns and Dunkel, 2010, p. 4)

In this context, EDA is more about the design of

event-driven architectures as a design style. CEP de-

scribes a technology for dynamic processing of large

datasets (Bruns and Dunkel, 2010). Thus, CEP is a

part of an EDA, which can be used for processing

data within it. In detail, CEP describes the dynamic

processing of large data streams (also called event

streams) in real-time. An event is any happening in

the system. Here, the change of state of a fact or an

object is represented (Bruns and Dunkel, 2010).

The processing of events within a CEP is real-

ized using rules. These rules contain knowledge

about handling events or event sequences (Dunkel and

Bruns, 2015). For the realization of these rules and

the processing of the data, the CEP contains Event

Processing Agents (EPA).

An EPN is a set of EPAs which are intercon-

nected and exchange information during and about

processing of the data (Dunkel and Bruns, 2015). An

EPN can be interpreted as a graphical tool for mod-

eling the ﬂow of events for event processing systems

(Koschel et al., 2017). Thus, the main components

of EPN are EPAs in order to be able to perform CEP.

EPAs contain various components, like Event Model,

(Event) Rules and Event Procession Engine (Dunkel

and Bruns, 2015).

Other components, such as producers, are also fur-

ther elements of EPNs and can be taken from (Dunkel

and Bruns, 2015) and (Koschel et al., 2017). The next

part explains the requirements for EPNs.

3.2 Requirements

We evaluate the selected tools following an identical

set of requirements, which we have put into a stan-

dardized form as show next.

Is Amazon Kinesis Data Analytics Suitable as Core for an Event Processing Network Model?

1037

3.2.1 Handling the Requirements

The set of requirements origins from our work in

(Koschel et al., 2017). We have standardized the form

of these requirements in (Schulze et al., 2023) by ap-

plying (Rupp and Pohl, 2021) and (IEE, 1998). The

reason behind that was, that the original requirements

were just described as bullet points, had no formal

structure and were partially a little ambiguous.

To address these issues, we evaluated various re-

quirement templates how they address issues such as

writeability, readability and learnability and are com-

monly used (Robertson and Robertson, 2012). We

have chosen (Rupp and Pohl, 2021) because it pro-

vides a straight-forward structure for requirements.

In addition, we apply the quality criteria of IEEE

830-1998 (IEE, 1998) to achieve high quality require-

ments in structure and content. Speciﬁcation of the

Quality criteria according to (IEE, 1998) are require-

ments, that are correct, unambiguous, complete, con-

sistent, veriﬁable, modiﬁable, traceable and ranked

for importance and/ or stability.

The requirements are formulated according to a

template and fulﬁll all quality criteria. The require-

ments templates achieve writeability, readability and

learnability and are therefore efﬁcient. This also sat-

isﬁes the modiﬁable criteria from IEEE 830-1993.

Correctness, unambiguousness and completeness

are achieved by splitting, expanding and substituting

specialist words. Requirements are checked to be

consistent, veriﬁable, traceable and they are ranked

by importance (see more details in (Schulze et al.,

2023)).

3.2.2 The Requirements

Our standardized requirements are as follows:

• EPNR1: The tool shall offer the developer to

model events with their inherent attributes as the

central component of the engine.

• EPNR2: The tool shall map real world descrip-

tions to events as scenarios.

• EPNR3: The tool shall offer event structures as

simple, complex or aggregated. Simple events can

be created and used independently. In addition,

complex events have dependencies and references

to other events. Also, aggregated events can be

grouped logically.

• EPNR4: The tool shall offer possibilities to ex-

press the relativity of events and their temporal

and causal relationships, e.g., sequence, precon-

ditions and postconditions.

• EPNR5: The tool shall process and show the ﬂow

of events through the system.

• EPNR6: The tool shall offer the modeling of EPN

by components, their properties and used patterns.

• EPNR7: The tool shall offer the modeling of

components outside the system boundary and the

behavior between inside and outside components.

• EPNR8: The tool should be expressive in usage,

about readability, writability, learnability and efﬁ-

ciency.

• EPNR9: The tool should offer the developer fur-

ther possibilities to create the model, e.g., IDE,

graphical event programming.

In (Schulze et al., 2023) we evaluated Apache

Storm (Str, 2021a). As the major contribution of the

present paper, we will evaluate Amazon Kinesis Data

Analytics against our requirements.

4 AMAZON KINESIS DATA

ANALYTICS

In this section, the Amazon Kinesis Data (AKD) An-

alytics tool is examined and evaluated. It was chosen

by the authors for its wide and modular use in Ama-

zon Web Services (AWS).

4.1 Overview of AKD Analytics

For a consideration of AKD Analytics, the authors

consider useful to ﬁrst clarify the relationship be-

tween AKD Analytics and Apache Flink, since AKD

Analytics is built on the Apache Flink framework.

4.1.1 Apache Flink

Apache Flink is an open source framework for dis-

tributed event processing. The application can be used

for batch or stream processing and offers stateful op-

erators as a special feature (Fli, 2022d). In addition, it

is designed to operate in all common cluster environ-

ments, however, it can also act as a standalone sys-

tem and performs computations at in-memory speed

at any scale.

Apache Flink Users report benchmarks of pro-

cessing multiple trillions of events per day or main-

taining multiple terabytes of sates while running on

thousands of cores (Fli, 2022a). In addition to AWS

companies like Uber, Alibaba or Capital One use

Apache Flink for streaming analytics, search rank-

ing optimization or real-time activity monitoring and

alerting (Fli, 2022e).

For CEP Apache Flink uses distributed applica-

tions that uses a set of abstractions and technical real-

izations in the following way (Fli, 2022b): Operator

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1038

Figure 1: Stream processing in Apache Flink (based on (Fli,

2022c)).

subtasks are chained together into tasks. Each task is

executed by one thread, this reduces the overhead of

thread-to-thread handover and buffering and increases

overall throughput while decreasing latency. The ba-

sic concept of the approach is MapReduce. As a spe-

cial feature there are stateful operators (see Figure 1).

Apache Flink requires effective allocation and

management of component resources in order to ex-

ecute streaming applications. This can be achieved

by integrating it with common cluster resources or

Apache Flink can be set up to run as a standalone

cluster or even as a library. We will only brieﬂy men-

tion the anatomy of a Apache Flink Cluster since the

AKD Analytics context will handle the clustering and

distribution in the system.

The Apache Flink runtime consists of two types

of processes, JobManager and TaskManager (Fli,

2022b).

JobManager. The JobManager decides when to

schedule the next task, reacts to ﬁnished tasks or ex-

ecution failures, coordinates checkpoints and coordi-

nates recovery on failures. This process consists of

three different components:

• The ResourceManager is responsible for re-

source de- or allocation and provisioning in a

Apache Flink Cluster and manages the task slots.

• The Dispatcher provides a REST interface to

submit Apache Flink applications for execution

and starts a new JobMaster for each submitted job.

It also runs the Apache Flink WebUI to provide

information about job executions.

• A JobMaster is responsible for managing the ex-

ecution of a single JobGraph.

TaskManager. The TaskManagers (or workers) ex-

ecute tasks of a dataﬂow. Also, they are responsible

for buffering and exchange the data streams. There

must always be at least one TaskManager. The small-

est unit of resource scheduling in a TaskManager is

a task slot. Multiple operators may execute in a task

slot (see Figure 1).

Each TaskManager is a JVM process and may ex-

ecute one or more subtasks in separate threads.

An Apache Flink Application consists of multiple

Apache Flink jobs. The execution of these jobs can

happen in a local JVM (LocalEnvironment) or on a

remote setup of clusters with multiple machines (Re-

moteEnvironment).

Jobs of an Apache Flink Application can either

be submitted to a long-running Apache Flink Session

Cluster, a dedicated Apache Flink Job Cluster, or an

Apache Flink Application Cluster. The difference be-

tween these options is mainly related to the clusters

lifecycle and to resource isolation guarantees.

Apache Flink Session Cluster.

• Cluster Lifecycle: In an Apache Flink Session

Cluster, the client connects to a pre-existing, long-

running Cluster that can accept multiple job sub-

missions. After all jobs are ﬁnished the session is

manually stopped.

• Resource Isolation: TaskManager slots are allo-

cated by the ResourceManager on job submission

and released once the job is ﬁnished. Because all

jobs are sharing the same cluster, there is some

competition for cluster resources. If some fatal

error occurs on the JobManager, it will affect all

jobs running in the cluster.

• Other Considerations: An existing cluster saves

a lot of time when requesting resources and start-

ing TaskManagers. This is important in scenar-

ios where the execution time of jobs is very short

and a high startup time would negatively impact

Is Amazon Kinesis Data Analytics Suitable as Core for an Event Processing Network Model?

1039

the end-to-end user experience as is the case with

interactive analysis of short queries, where it is

desirable that jobs can quickly perform computa-

tions using existing resources.

Apache Flink Job Cluster.

• Cluster Lifecycle: In an Apache Flink Job Clus-

ter, the available cluster manager (like YARN) is

used to spin up a cluster for each submitted job.

The client ﬁrst requests resources from the clus-

ter manager to start the JobManager and submits

the job to the Dispatcher running inside this pro-

cess. Once the job is ﬁnished, the Apache Flink

Job Cluster is torn down.

• Resource Isolation: A fatal error in the Job-

Manager only affects that one job running in the

Apache Flink Job Cluster.

• Other Considerations: The ResourceManager

has to apply and wait for external resource man-

agement components to start the TaskManager

processes and allocate resources. Apache Flink

Job Clusters are more suited to large jobs that are

long-running, have high-stability requirements

and are not sensitive to longer startup times.

Apache Flink Application Cluster.

• Cluster Lifecycle: An Apache Flink Application

Cluster is a dedicated Apache Flink Cluster that

only executes jobs from one Apache Flink Appli-

cation and where the main() method runs on the

cluster rather than the client.

The job submission is a one-step process: There

is no need to start a Apache Flink Cluster ﬁrst

and then submit a job to the existing Cluster ses-

sion, instead the application logic and dependen-

cies are packaged into a executable job JAR and

the Cluster entry point (ApplicationClusterEntry-

Point) is responsible for calling the main() method

to extract the JobGraph. This allows to deploy an

Apache Flink Application like any other applica-

tion on Kubernetes, for example. The lifetime of

an Apache Flink Application Cluster is therefore

bound to the lifetime of the Apache Flink Appli-

cation.

• Resource Isolation: In an Apache Flink Ap-

plication Cluster, the ResourceManager and Dis-

patcher are scoped to a single Apache Flink Ap-

plication, which provides a better separation of

concerns.

After the short introduction to Apache Flink as the

basis of AKD Analytics we look at the actual tool

next.

4.1.2 Analytics in Amazon Kinesis

AKD Analytics is part of the AWS Kinesis portfolio,

which still consists of AKD Firehouse for data persis-

tence. As well as AKD Streams and Amazon Kinesis

Video Streams for scalable continuous Streaming of

data. Video Streams is specialized in video data.

In this portfolio, AKD Analytics is designed for

process data from streams or batches up to real-time

processing. Like mentioned before, this processing

is done by as an Apache Flink Application and AKD

Analytics is a framework for this use. AKD Analytics

also offers SQL processing, but this is ﬂagged legacy

or deprecated. AKD Analytics does not necessarily

require Amazon Streams for processing, but can basi-

cally be used with other tuple-based streams as well.

4.1.3 Amazon Kinesis Data Analytics

Framework

To use AKD Analytics an AWS Identity and Manage-

ment Account (IAM) and the AWS Command Line

Interface (CLI) are needed. This paper will not dis-

cuss these steps further, as they are necessary in gen-

eral to use AWS services.

An AKD Analytics Application has the following

components (see Figure 1 (Com, 2022)):

• Runtime properties: Runtime properties are for

runtime conﬁguration and independent of the ap-

plication code.

• Source: Sources are the input for an applica-

tion and can be any kind of tuple-based formats,

e.g., Kinesis Data Stream or Apache Kafka (DA-,

2022a).

• Operators: The application processes the data by

means of tasks that are implemented in operator

chains. An operator can transform, enrich, or ag-

gregate data. Operators can have states.

• Sink: The results of the calculation are directed

into sinks, e.g., Kinesis data stream, a Kinesis

Data Firehose delivery stream, an Amazon S3

bucket.

To use a CEP application in AKD Analytics the

following steps are needed: There must be at least

one stream that the application can process. There

must be at least one sink for the results. Both must be

created or connected to the AWS Kinesis environment

in order to be used by the Analytics application.

In order to implement the desired business-logic,

AWS provides Java example code that can be cloned

from a Git repository (DA-, 2022b).

The core features of this example are: A Project

Object Model ﬁle, containing information about the

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1040

applications conﬁguration and dependencies, includ-

ing the AKD Analytics libraries.

The BasicStreamingJob.java ﬁle containing the

main method that deﬁnes the application’s function-

ality. The example uses a Kinesis source to read

from the source stream, that is where the source

has been connected before. It creates source and

sink connectors to access external resources using

a Stream-Execution-Environment object, in default

these are created using static properties. To use dy-

namic application properties, createSourceFromAp-

plicationProperties and createSinkFromApplication-

Properties methods to create the connectors can be

used. These methods read the applications properties

to conﬁgure the connectors (DA-, 2022c).

After the desired logic is implemented, the code is

compiled to a JAR and uploaded to AWS. The JAR is

then used to create and run a AKD Analytics Appli-

cation that connects to the source and sinks that has

been set in the AWS before. Finally the AKD Ana-

lytics Application is started and the data is processed

(DA-, 2022c).

In practice, it boils down to the following: con-

nect a stream source, connect a sink, implement the

required business logic, i.e. using the Java example,

upload and start it in the AWS. Amazon now will han-

dle everything else corresponding to your AWS plan.

Thus, we examined Apache Flink and how it

works as a foundation for AKD Analytics, consider-

ing the ﬂow and distribution of processing. Next, we

considered the commissioning of an AKD Analytics

application. This demonstrated that AKD Analytics is

a framework. Finally, it was explained which compo-

nents can be controlled in the framework and which

components can be implemented exclusively as logic

within the Apache Flink foundation.

In the next part, we evaluate Amazon Kinesis Data

Analytics against our EPNM requirements.

4.2 Evaluation of Amazon Kinesis Data

Analytics for Modeling EPN

This part will argue, which requirements are fulﬁlled

by AKD Analytics.

• EPNR1 - fulﬁlled:

Events are abstracted by tuples. Tuples contain

structured lists of attributes which can take any

primitive or complex value. This can be done

static or dynamic, hence no restriction.

• EPNR2 - fulﬁlled:

Tuples from the source stream can be dynamically

adapted to match and describe any real world sce-

nario.

• EPNR3 - fulﬁlled:

Basically, AKD Analytics abstracts the concrete

processing of events by Apache Flink. Never-

theless, the tool offers the required functions for

the runnable AKD Analytics Application by by

means of operators.

• EPNR4 - fulﬁlled:

Through the processing steps within the Apache

Flink operator chain, causal relationships and de-

pendencies can be mapped. Furthermore, the tool

offers the direct possibility to look at temporal and

causal relations within or between several stateful

operators.

• EPNR5 - not fulﬁlled:

Although the event ﬂow can be monitored and

evaluated using AWS metrics, there is no concrete

representation in AKD Analytics in the sense of

this requirement, since this is precisely part of the

abstraction that AKD Analytics performs.

• EPNR6 - fulﬁlled:

The tool is based precisely on the fact that com-

ponents, their properties and patterns to be applied

just model and do not have to be concretized. This

is done by the framework.

• EPNR7 - fulﬁlled:

By default the components, which lie outside of

the system borders, are represented statically by

abstract objects. However, these can also be

adapted dynamically, so that the behavior of these

components can be modeled.

• EPNR8 - fulﬁlled:

In AWS, there are many tutorials in written and

video form on how to build a AKD Analytics Ap-

plication. It exists an IDE. For Java, there is a

prepared example that can be cloned from Git and

then only needs to be extended by the functional

code. Developer basically only need to familiarize

themselves with the functionalities of the library

provided.

• EPNR9 - fulﬁlled:

AWS offers AKD Analytics Studio, a specialized

IDE with an interface for Scala, Python, or SQL.

In addition to the standard features of an IDE, Stu-

dio also offers visualization capabilities.

In conclusion, AKD Analytics is a suitable tool for

CEP, can represent things of the real world and guar-

antees the processing of data at any scale. To use it,

the AWS environment is required. In return, the de-

veloper can fully focus on the domain, as AWS takes

care of and guarantees scaling and deployment at the

desired quality.

Is Amazon Kinesis Data Analytics Suitable as Core for an Event Processing Network Model?

1041

Table 1: Tools – Criteria, Characteristics, Comparison.

Amazon Kinesis

Data Analytics

Microsoft Azure

Stream Analytics

Basis MapReduce Trill

Support

Plattform-as

-a-Service (PaaS)

Plattform-as

-a-Service (PaaS)

Costs costs based on usage costs based on usage

Effort low average

Event format Tuple JSON, AVRO, csv

Environment AWS Azure

Language

Java, Scala,

Python, SQL

SQL based,

JavaScript or C#

user-deﬁned functions

Distribution

Thread-based in

Apache Flink

no details

Maximum Processing Rate real-time real-time

Reliability guaranteed by AWS 99.9%promised

Data protection

Server location can

be set

Server location can

be set

Security provided by AWS provided by Azure

5 COMPARISON

In this article, we analyzed in Section 4 Amazon Ki-

nesis Data Analytics (AKD) in-depth with respect

to our standardized set of requirements from Sec-

tion 3.2.2. We examined that Amazon Kinesis Data

Analytics mostly fullﬁls our requirements. An excep-

tion is EPNR5, however, this is more due to ’the na-

ture’ of AKD. Generally speaking, with the respect of

or requirements, AKD is well suited for the realiza-

tion of EPNs.

To provide some more distinctive criteria to other

tools, we took in particular a more developer-oriented

perspective. The result is summarized in Table 1,

which brieﬂy compares two ouf our tools under eval-

uation, namely AKD Analytics and Microsoft Azure

Stream Analytics (ASA). Developers may use this ta-

ble to identify the most important criteria that argue

for or against a tool. In particular an open source na-

ture (cf. our analysis of Apache Storm in (Schulze

et al., 2023)), price, convenience, and potential ven-

dor lock in some distinctive factors.

Mainly, the collected information for the presen-

tation of AKD Analytics (as well as other tools) was

taken from the documentation of the publishers or de-

velopers of it. Due to this, some information may

be presented subjectively, as companies would like

to widely distribute their tool in any case. Moreover,

AKD and Microsoft ASA are costly tools, so an ad-

vertising factor within the documentation cannot be

ruled out. Even more, information may be incomplete

because companies want to keep their implementa-

tions private.

Both, AKD Analytics and ASA are commercial

products and thus not free of charge. Maintenance is

supported by the vendors themselves, both are nicely

hosted and maintained by Amazon respectively Mi-

crosoft and thus possibly easier to be used compared

to self-hosted systems, such as Apache Storm (cf.

(Schulze et al., 2023)). Thus, there is no clear winner

between AKD Analytics and ASA, but more a ques-

tion of individual developer taste and skills as well as

company preferences. For example, if a company is

an AWS shop anyway, wants likely less maintenance

effort and is able to pay the costs for AKD Analytics,

then it could be more favorable. Similar arguments

hold for Microsoft ASA respectively.

6 CONCLUSION

In this article, we had a deep look at Event Process-

ing Network Models, as a foundation of Event Stream

Processing tools (cf. (Schulze et al., 2023)) and pre-

sented our enhanced (compared to our earlier work)

standardized set of requirements for EPN models in

Section 3.2.2.

As the key contribution of this article, we apply

those requirements for an in-depth look at Amazon

Kinesis Data Analytics in Section 4. It turns out, that

Amazon Kinesis Data Analytics is a well suited tool

for modeling and implementation of EPNs. Addition-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1042

ally we brieﬂy compared Amazon Kinesis Data Ana-

lytics with Microsoft Azure Stream Analytics in Sec-

tion 5. Already in (Schulze et al., 2023) we evaluated

Apache Storm.

Since all those tools mostly fulﬁlled our require-

ments, comparisons may need other criteria as well.

The suitability of a tool depends on more individual

circumstances, such as which kind of ’shop’ you are

– for example, Amazon vs. Microsoft –, but also how

high the own development and administration effort

should be.

In future work of ours, we will also provide in-

depth evaluations of Microsoft Azure Stream Analyt-

ics regarding our EPN model requirements.

Therefore, the decision for a suitable tool is based

on the effort, the control and the costs involved. For

these reasons, no absolute recommendation can be

made. Rather the authors recommend examining each

individual use case or at least a set of typical ones, in

order to select the ideal tool.

REFERENCES

(1998). IEEE Recommended Practice for Software Re-

quirements Speciﬁcations. IEEE Std 830-1998, pages

1–40.

(2021a). Apache Storm. Apache Software Foundation. On-

line: https://storm.apache.org/ [retrieved: 04, 2022].

(2021b). Streaming Data Solutions on AWS. Ama-

zon Web Services Inc. Online: https:

//docs.aws.amazon.com/whitepapers/latest/

streaming-data-solutions-amazon-kinesis/welcome.

html [retrieved: 04, 2022].

(2022a). Adding Streaming Data Sources to Kinesis

Data Analytics for Apache Flink. Amazon Web

Services Inc. Online: https://docs.aws.amazon.

com/kinesisanalytics/latest/java/how-sources.html

[retrieved: 04, 2022].

(2022a). Apache Flink. Apache Software Foundation. On-

line: https://ﬂink.apache.org/ﬂink-architecture.html

[retrieved: 04, 2022].

(2022b). Apache Flink Stream Concepts. Apache

Software Foundation. Online: https:

//nightlies.apache.org/ﬂink/ﬂink-docs-release-1.

14/docs/concepts/ﬂink-architecture/ [retrieved: 04,

2022].

(2022c). Apache Flink Stream Processing. Apache

Software Foundation. Online: https:

//nightlies.apache.org/ﬂink/ﬂink-docs-release-1.

14/docs/learn-ﬂink/overview/#stream-processing

[retrieved: 04, 2022].

(2022d). Apache Flink Usecases. Apache Software Founda-

tion. Online: https://ﬂink.apache.org/usecases.html

[retrieved: 04, 2022].

(2022e). Apache Flink Users. Apache Software Foundation.

Online: https://ﬂink.apache.org/poweredby.html [re-

trieved: 04, 2022].

(2022). Components of a Kinesis Data Analytics for

Flink Application. Amazon Web Services Inc. On-

line: https://docs.aws.amazon.com/kinesisanalytics/

latest/java/getting-started.html?pg=ln&cp=bn#

getting-started-components [retrieved: 04, 2022].

(2022b). Data Analytics Java Exmaple. Amazon Web Ser-

vices Inc. Online: https://github.com/aws-samples/

amazon-kinesis-data-analytics-java-examples [re-

trieved: 04, 2022].

(2022c). Data Analytics Java Exmaple Usage. Amazon Web

Services Inc. Online: https://docs.aws.amazon.com/

kinesisanalytics/latest/java/get-started-exercise.html

[retrieved: 04, 2022].

Astrova, I., Koschel, A., Kobert, S., Naumann, J., Ruhe, T.,

and Starodubtsev, O. (2019). Evaluating RuleCore as

Event Processing Network Model. Proc. 15th Inter-

national Conference on Web Information Systems and

Technologies (WEBIST 2019), pages 297–300.

Bruns, R. and Dunkel, J. (2010). Event-Driven Archi-

tecture - Softwarearchitektur f

ur ereignisgesteuerte

Gesch

aftsprozesse (Software architecture for event-

driven business processes). Springer.

Dunkel, J. and Bruns, R. (2015). Complex Event Processing

- Komplexe Analyse von massiven Datenstr

omen mit

CEP (Complex analysis of massive data streams with

CEP). Springer Vieweg.

Koschel, A., Astrova, I., Kobert, S., Naumann, J., Ruhe, T.,

and Starodubtsev, O. (2017). Towards Requirements

for Event Processing Network Models. Proc. 8th In-

ternational Conference on Information, Intelligence,

Systems, Applications (IISA 2017), pages 27–30.

Koschel, A., Astrova, I., Kobert, S., Naumann, J., Ruhe,

T., and Starodubtsev, O. (2018). On Requirements

for Event Processing Network Models Using Business

Event Modeling Notation. Proc. 2018 Conf. Intelli-

gent Computing. Advances in Intelligent Systems and

Computing (SAI 2018), pages 756–762.

Luckham, D. (2002). The Power of Events. Addison Wes-

ley, USA.

Microsoft (2021). Introduction to Azure Stream

Analytics. Microsoft Documentation. On-

line: https://docs.microsoft.com/en-us/azure/

stream-analytics/stream-analytics-introduction

[retrieved: 04, 2022].

Robertson, S. and Robertson, J. (2012). Mastering the

Requirements Process: Getting Requirements Right.

Addison-Wesley Professional.

Rupp, C. and Pohl, R. (2021). Basiswissen Requirements

Engineering (Basic knowledge Requirements Engi-

neering). dpunkt.verlag.

Schulze, C., Gerner, C., Tyca, M., Koschel, A., Pakosch, A.,

and Astrova, I. (2023). Analyzing Apache Storm as

Core for an Event Processing Network Model. Proc.

International Conference Intelligent Systems Confer-

ence (IntelliSys 2023).

Shaikh, T. (2019). Batch Processing — Hadoop

Ecosystem. K2 Data Science and Engineer-

ing. Online: https://blog.k2datascience.com/

batch-processing-hadoop-ecosystem-f6da88f11cae

[retrieved: 04, 2022].

Is Amazon Kinesis Data Analytics Suitable as Core for an Event Processing Network Model?

1043