Performance of Machine Learning and Big Data Analytics Paradigms

in Cyber-security and Cloud Computing Platforms

Gabriel Kabanda

Zimbabwe Academy of Sciences, TREP Building, University of Zimbabwe, Harare, Zimbabwe

Keywords: Cybersecurity, Artificial Intelligence, Machine Learning, Deep Learning, Big Data Analytics, Cloud

Computing.

Abstract: The purpose of the research is to evaluate Machine Learning and Big Data Analytics paradigms for use in

Cybersecurity. Cybersecurity refers to a combination of technologies, processes and operations that are

framed to protect information systems, computers, devices, programs, data and networks from internal or

external threats, harm, damage, attacks or unauthorized access. The main characteristic of Machine

Learning (ML) is the automatic data analysis of large data sets and production of models for the general

relationships found among data. ML algorithms, as part of Artificial Intelligence, can be clustered into

supervised, unsupervised, semi-supervised, and reinforcement learning algorithms. The Pragmatism

paradigm, which is in congruence with the Mixed Method Research (MMR), was used as the research

philosophy in this research as it epitomizes the congruity between knowledge and action. The researcher

analysed the ideal data analytics model for cybersecurity which consists of three major components which

are Big Data, analytics, and insights. The information that was evaluated in Big Data Analytics includes a

mixer of unstructured and semi-structured data including social media content, mobile phone records, web

server logs, and internet click stream data. Performance of Support Vector Machines, Artificial Neural

Network, K-Nearest Neighbour, Naive-Bayes and Decision Tree Algorithms was discussed. To avoid denial

of service attacks, an intrusion detection system (IDS) determined if an intrusion has occurred, and so

monitored computer systems and networks, and then raised an alert when necessary. A Cloud computing

setting was added which has advanced big data analytics models and advanced detection and prediction

algorithms to strengthen the cybersecurity system. The research results presented two models for adopting

data analytics models to cybersecurity. The first experimental or prototype model involved the design, and

implementation of a prototype by an institution and the second model involved the use service provided by

cloud computing companies. The researcher also demonstrated how this study addressed the performance

issues for Big Data Analytics and ML, and its impact on cloud computing platforms.

1 INTRODUCTION

1.1 Background

The era of the Internet of Things (IoT) generates

huge volumes of data collected from various

heteregenous sources which may include mobile

devices, sensors and social media. This Big Data

presents tremendous challenges on the storage,

processing and analytical capabilities. Cloud

Computing provides a cost-effective and valid

solution in support of Big Data storage and

execution of data analytic applications. IoT requires

both cloud computing environment to handle its data

exchange and processing; and the use of artificial

intelligence (AI) for data mining and data analytics.

A hybrid cybsecurity model which uses AI and

Machine Learning (ML) techniques may mitigate

against IoT cyber threats on cloud computing

environments. Security issues related to

virtualisation, containerization, network monitoring,

data protection and attack detection are interrogated

whilst strengthening AI/ML security solutions that

involve encryption, access control, firewall,

authentication and intrusion detection and

prevention systems at the appropriate Fog/Cloud

level.

Cybersecurity consolidates the confidentiality,

integrity, and availability of computing resources,

networks, software programs, and data into a

Kabanda, G.

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms.

DOI: 10.5220/0010789900003167

In Proceedings of the 1st International Conference on Innovation in Computer and Information Science (ICICIS 2021), pages 33-50

ISBN: 978-989-758-577-7

Copyright

c

33

coherent collection of policies, technologies,

processes, and techniques to prevent the occurrence

of an attack (Berman et al., 2019). Cybersecurity

refers to a combination of technologies, processes

and operations that are framed to protect information

systems, computers, devices, programs, data and

networks from internal or external threats, harm,

damage, attacks or unauthorized access (Sarker et

al., 2020).

The Network Intrusion Detection Systems

(NIDS) is a category of computer software that

monitors system behaviour with a view to ascertain

anomalous violation of security policies and

distinguishes between malicious users and the

legitimate network users (Bringas and Santos, 2010).

According to (Truong et al., 2020), the components

in Intrusion Detection and Prevention Systems

(IDPSs) can be sensors or agents, servers, and

consoles for network management. An intrusion

detection and prevention system (IDPS), shown on

Figure 1 below, is placed inside the network to

detect possible network intrusions and, where

possible, prevent the cyber attacks. The key

functions of the IDPSs are to monitor, detect,

analyze, and respond to cyber threats.



Figure 1: Typical Intrusion detection system.

Computers are instructed to learn through the

process called Machine Learning (ML), a field

within artificial intelligence (AI). The main

characteristic of ML is the automatic data analysis of

large data sets and production of models for the

general relationships found among data. ML

algorithms require empirical data as input and then

learn from this input. Deep Learning (DL), as a

special category of ML, brings us closer to AI. The

three classes of ML are as illustrated on Figure 2

below [5], and these are:

a) Supervised Learning: where the methods are

given inputs labeled with corresponding outputs as

training examples;

b) Unsupervised Learning: where the methods are

given unlabeled inputs;

c) Reinforcement Learning: where data is in the

form of sequences of observations, actions, and

rewards.

Figure 2: Three levels of Machine Learning (Source: (Proko et

al., 2018)).

The transformation and expansion of the cyber space

has led to the generation, use, storage and processing

of big data, that is, large, diverse, complex,

multidimensional and usually multivariate datasets

(Mazumdar and Wang, 2018). (Hashem et al., 2015)

explained big data as the increase in volume of data

that offers difficulty in storage, processing and

analysis through the traditional database

technologies. Big Data came into existence when the

traditional relational database systems were not able

to handle the unstructured data generated by

organizations, social media, or from any other data

generating source (Mahfuzah et al., 2017). Big data

analytics makes use of analytic techniques such as

data mining, machine learning, artificial learning,

statistics, and natural language processing. In an age

of transformation and expansion in the Internet of

Things (IoT), cloud computing services and big data,

cyber-attacks have become enhanced and

complicated (Moorthy et al., 2014), and therefore

cybersecurity events become difficult or impossible

to detect using traditional detection systems (Cox

and Wang, 2014), (Hammond, 2015). Big Data has

also been defined according to the 5Vs as stipulated

by (Yang et al., 2017) where:

 Volume refers to the amount of data gathered

and processed by the organisation

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

34

 Velocity referring to the time required to do

processing of the data

 Variety refers to the type of data contained in

Big Data

 Value referring to the key important features of

the data. This is defined by the added-value that the

collected data can bring to the intended processes.

 Veracity means the degree in which the leaders

trust the information to make a decision.

The supervised machine learning algorithm which

can be used for both classification or regression

challenges is called the Support Vector Machine

(SVM). The original training data can be transformed

into a higher dimension where it becomes separable

by using the SVM algorithm which searches for the

optimal linear separating hyperplane. The easiest and

simplest supervised machine learning algorithm

which can solve both classification and regression

problems is the k-nearest neighbors (KNN)

algorithm. Both the KNN and SVM can be applied to

finding the optimal handover solutions in

heterogeneous networks constituted by diverse cells.

The Hidden Markov Model (HMM) is a tool

designed for representing probability distributions of

sequences of observations. The list of supervised

learning algorithms includes Regression models, K-

nearest neighbors, Support Vector Machines, and

Bayesian Learning (Jiang et al., 2016). In Table 1, we

summarize the basic characteristics and applications

of supervised machine learning algorithms.

Table 1: Various attack descriptions (Source: (Mazumdar

and Wang, 2018)).

1.2 Statement of the Problem

Firewall protection has proved to be inadequate

because of gross limitations against external threats.

The rapid development of computing and digital

technologies, the need to revamp cyber defense

strategies has become a necessity for most

organisations (Proko et al., 2018). As a result, there

is an imperative for security network administrators

to be more flexible, adaptable, and provide robust

cyber defense systems in real-time detection of

cyber threats.The key problem is to evaluate

Machine Learning (ML) and Big Data Analytics

(BDA) paradigms for use in Cybersecurity.

1.3 Purpose of Study

The research is purposed to evaluate Machine

Learning and Big Data Analytics paradigms for use

in Cybersecurity.

1.4 Research Objectives

The research objectives are to:

1)Evaluate Machine Learning and Big Data

Analytics paradigms for use in cybersecurity.

2)Develop a Cybersecurity system that uses

Machine Learning and Big Data Analytics

paradigms.

1.5 Research Questions

The main research question was:

Which Machine Learning and Big Data Analytics

paradigms are most effective in developing a

Cybersecurity system?

The sub questions are:

1)How are the Machine Learning and Big Data

Analytics paradigms used in Cybersecurity?

2)How is a Cybersecurity system developed that

uses Machine Learning and Big Data Analytics

paradigms?

2 LITERATURE REVIEW

2.1 Overview

Computers, phones, internet and all other

information systems developed for the benefit of

humanity are susceptible to criminal activity (Cox

and Wang, 2014). Cybercrimes consist of offenses

such as computer intrusions, misuse of intellectual

property rights, economic espionage, online

extortion, international money laundering, non-

delivery of goods or services, etc. (Yang et al.,

2017). Intrusion detection and prevention systems

(IDPS) include all protective actions or

identification of possible incidents, and analysing

log information of such incidents (Truong et al.,

2020). (Yang et al., 2017) recommends the use of

various security control measures in an organisation.

Various attack descriptions from the outcome of the

research by (Mazumdar and Wang, 2018) are shown

on Table 1. The monotonic increase in an assortment

of cyber threats and malwares amply demonstrates

the inadequacy of the current countermeasures to

defend computer networks and resources. To

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

35

alleviate the problems of classical techniques of

cyber security, research in artificial intelligence and

more specifically machine learning is sought after

(Berman et al., 2019), (Sarker et al., 2020). To

enhance the malware and cyber-attack detection rate,

one can apply deep learning architectures to cyber

security.

2.2 Classical Machine Learning (CML)

Machine Learning (ML) is a field in artificial

intelligence where computers learn like people. We

present and briefly discuss the most commonly used

classical machine learning algorithms.

2.2.1 Logistic Regression (LR)

As an idea obtained from statistics and created by

(Petrenko and Makovechuk, 2020), logistic

regression is like linear regression, yet it averts

misclassification that may occur in linear regression.

Unlike linear regression, logistic regression results

are basically either ‘0’ or ‘1’. The efficacy of

logistic regression is mostly dependent on the size of

the training data.

2.2.2 Naive Bayes (NB)

Naive Bayes (NB) classifier is premised on the

Bayes theorem which assumes independence of

features. The independence assumptions in Naive

Bayes classifier overcomes the curse of

dimensionality.

2.2.3 Decision Tree (DT)

A Decision tree has a structure like flow charts,

where the root node is the top node and a feature of

the information is denoted by each internal node.

The algorithm might be biased and may end up

unstable since a little change in the information will

change the structure of the tree.

2.2.4 K-Nearest Neighbor (KNN)

K-Nearest Neighbor (KNN) is a non-parametric

approach which uses similarity measure in terms of

distance function classifiers other than news cases.

KNN stores the entire training data, requires larger

memory and so is computationally expensive.

2.2.5 Ada Boost (AB)

Ada Boost (AB) learning algorithm is a technique

used to boost the performance of simple learning

algorithms used for classification. Ada Boost

constructs a strong classifier using a combination of

several weak classifiers. It is a fast classifier and at

the same time can also be used as a feature learner.

This may be useful in tasks that use imbalanced data

analysis.

2.2.6 Random Forest (RF)

Random forest (RF), as an ensemble tool, is a

decision tree derived from a subset of observations

and variables. The Random Forest gives better

predictions than an individual decision tree. It uses

the concept of bagging to create several minimal

correlated decision trees.

2.2.7 Support Vector Machine (SVM)

Support Vector Machine (SVM) belongs to the

family of supervised machine learning techniques,

which can be used to solve classification and

regression problems. SVM is a linear classifier and

the classifier is a hyper plane. It separates the

training set with maximal margin. The points near to

the separating hype plane are called support vectors

and they determine the position of hyper plane.

2.3 Modern Machine Learning

Deep learning is a modern machine learning which

has the capability to take raw inputs and learns the

optimal feature representation implicitly. This has

performed well in various long standing artificial

intelligence tasks (Bringas and Santos, 2010). Most

commonly used deep learning architectures are

discussed below in detail.

2.3.1 Deep Neural Network (DNN)

An artificial neural network (ANN) is a

computational model influenced by the

characteristics of biological neural networks. The

family of ANN includes the Feed forward neural

network (FFN), Convolutional neural network and

Recurrent neural network (RNN). FFN forms a

directed graph in which a graph is composed of

neurons named as mathematical unit. Each neuron in

i

th

layer has connection to all the neurons in i + 1

th

layer.

Each neuron of the hidden layer denotes a parameter

h that is computed by

h

i

(x) = f (w

i

T x + b

i

) (1)

hii: Rdi−1 → Rdi (2)

f : R → R (3)

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

36

where w

i

∈ Rd×d i−1 , b

i

∈ Rd

i

, d

i

denotes the size

of the input, f is a non-linear activation function,

ReLU.

The traditional examples of machine learning

algorithms include Linear regression, Logistic

regression, Linear discriminant analysis,

classification and regression trees, Naïve bayes, K-

Nearest Neighbour (K-NN), Kmeans clustering

Learning Vector Quantization (LVQ), Support

Vector Machines (SVM), Random Forest, Monte

Carlo, Neural networks and Q-learning. Take note

that:

 Supervised Adaptation is carried out in the

execution of system at every iteration.

 Unsupervised Adaptation follows trial and

error method. Based on the obtained fitness value,

computational model is generalized to achieve better

results in an iterative approach.

2.3.2 The Future of AI in the Fight against

Cybercrimes

Experiments showed that NeuroNet is effective

against low-rate TCP-targeted distributed DoS

attacks. (Fernando and Dawson, 2009) presented the

Intrusion Detection System using Neural Network

based Modeling (IDS-NNM) which proved to be

capable of detecting all intrusion attempts in the

network communication without giving any false

alerts (Menzes et al., 2016).

The characteristics of NIC algorithms are

partitioned into two segments such as swarm

intelligence and evolutionary algorithm. The Swarm

Intelligence-based Algorithms (SIA) are developed

based on the idea of collective behaviours of insects

in colonies, e.g. ants, bees, wasps and termites.

Intrusion detection and prevention systems (IDPS)

include all protective actions or identification of

possible incidents and analysing log information of

such incidents (Truong et al., 2020).

2.4 Big Data Analytics and

Cybersecurity

Big Data Analytics requires new data architectures,

analytical methods, and tools. Big data environments

ought to be magnetic, which accommodates all

heterogeneous sources of data. Instead of using

mechanical disk drives, it is possible to store the

primary data-base in silicon-based main memory,

which improves performance. Forecast analytics

attempt to predict cybersecurity events using

forecast analytics models and methodologies

(Petrenko and Makovechuk, 2020). Threat

intelligence helps to gather threats from big data,

analyze and filter information about these threats

and create an awareness of cybersecurity threats

(Sarker et al., 2020).

The situation awareness theory postulated by

(Xin et al., 2019) posits that the success of a

cybersecurity domain depends on its ability to obtain

real-time, accurate and complete information about

cybersecurity events or incidents (Menzes et al.,

2016). The situation awareness model consists of

situation awareness, decisions and action

performance

as shown in Figure 3. The

Figure 3: simplified theoretical model based on situation awareness.

Situational

Awareness

•Level 1: Gathering

evidence

•Level 2:

Analysing

evidence

•Level 3: Predictive

analytics

Decisions

•Capabilities

•Interface

•Mechanisms

•Stress &

Workload

•Complexity

•Workload

•Automation

Action

Performance

•Capabilities

•Interface

•Mechanisms

•Stress &

Workload

•Complexity

•Workload

•Automation

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

37

transformation and expansion of the cyberspace

(Cox and Wang, 2014) has rendered traditional

intrusion detection and malware detection systems

obsolete. Further, even the data mining models that

have been used in the past are no longer sufficient

for the challenges in cyber security (Cox and Wang,

2014).

A big data analytics model for cybersecurity can

be evaluated on the basis of its agility and robustness

(Cox and Wang, 2014). According to (Pense, 2014),

Big Data is defined not only by the amount of the

information that it delivers but also by its

complexity and by the speed that it is analyzed and

delivered.

2.5 Advances in Cloud Computing

Cloud computing is about using the internet to

access someone else's software running on someone

else's hardware in someone else's data center

(Umamaheswari and Sujatha, 2017). Cloud

Computing is essentially virtualized distributed

processing, storage, and software resources and a

service, where the focus is on delivering computing

as a on-demand, pay-as-you-go service. The NIST

Cloud computing framework states that cloud

computing is made up of five essential

characteristics, three service models and four

deployment models (Gheyas and Abdallah, 2016), as

shown on Figure 4. The five (5) essential

characteristics of Cloud Computing are briefly

explained follows:

1.On-demand Self-service: A consumer can

unilaterally provision computing capabilities such as

server time and network storage as needed

automatically, without requiring human interaction

with a service provider.

2.Broad Network Access: Heterogeneous client

platforms available over the network come with

numerous capabilities that enable provision of

network access.

3.Resource Pooling: Computing resources are

pooled together in a multi-tenant model depending

on the consumer demand in a location independent

manner.

4.Rapid Elasticity:

This is when unlimited capabilities are rapidly and

elastically provisioned or purchased to quickly scale

out; and rapidly released to quickly scale in.

5.Measured Service:

A transparent metering capability can be

automatically controlled and optimized in cloud

systems at some level of abstraction appropriate to

the type of service.

Service delivery in Cloud computing comprises

three (3) Cloud Service Models, namely Software-

as-a-Service (SaaS), Platform-as-a-Service (PaaS)

and Infrastructure-as-a-Service (IaaS). These three

models are shown on Figure 5, are discussed below.

2.5.1 Software as a Service (SaaS)

The provider’s applications running on a cloud

infrastructure provide a capability to the consumer

for use. It utilizes the Internet to deliver applications

to the consumers (e.g., Google Apps, Salesforce,

Dropbox, Sage X3 and office 365) (Buyya et al.,

2008). This is about a wide range of applications

from social to enterprise applications such as email

hosting, enterprise resource planning and supply

chain management. The consumer only handles

minimal user specific application configuration

settings. SaaS provides off-the-shelf applications

offered over the internet and is the most widely used

service model (Gheyas and Abdallah, 2016); (Hadi,

2015). Examples include Google Docs, Aviary,

Pixlr, and the Microsoft Office Web Application.

2.5.2 Platform as a Service (PaaS)

PaaS provides to the consumer infrastructure for

third-party applications. Just like in SaaS the

consumer does not manage or control the underlying

cloud infrastructure including network, servers,

operating systems, or storage, but does have control

over the deployed applications and possibly

configuration settings for the application-hosting

environment (Gheyas and Abdallah, 2016); (Hadi,

2015). Examples include Windows Azure, Apache

Stratos, Google App Engine, CloudFoundry,

Heroku, AWS (Beanstalk) and OpenShift (Buyya et

al., 2008) & (Suryavanshi, 2017). PaaS supports

business agility and provides an enabling

environment for a consumer to run applications and

services including Language, Operating System

(OS), Database, Middleware and Other applications.

2.5.3 Infrastructure as a Service (IaaS)

This provisions processing, networks, storage, and

other essential computing resources on which the

consumer is then able to install and run arbitrary

software, that can include operating systems (Virtual

machines (VM), appliances, etc.) and applications

(Gheyas and Abdallah, 2016); (Hadi, 2015).

Common global examples include Amazon Web

Services (AWS), Cisco Metapod, Microsoft Azure,

Rackspace and the local ones include TelOne cloud

services

and Dandemutande (Buyya et al., 2008).

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

38

Figure 4: NIST Visual Model of Cloud Computing Definition Source: (Gheyas and Abdallah, 2016).

IaaS is a Cloud service that allows existing

applications to run on its hardware. It rents out

resources dynamically wherever they are needed.

Services include Computer Servers, Data Storage,

Firewall and Load Balancer.

3 CLOUD DEPLOYMENT

MODELS

The three commonly-used cloud deployment models

are private, public, and hybrid. An additional model

is the community cloud which is less commonly

used. In a Cloud context the term deployment

basically refers to where the software is made

available, in other words where it is running.

3.1 Private Cloud

The private cloud is normally either owned or

exclusively used by a single organization. The

services and infrastructure are permanently kept on a

private network, the hardware and software are

dedicated solely to the particular organisation. The

major advantage of this model is the improved

security as resources are not shared with others

thereby allowing for higher levels of control and

security (Burt et al., 2013).



Figure 5: Cloud Computing Service Models.

3.2 Public Cloud

The cloud infrastructure is provisioned for use by

the general public. The public cloud is sold to the

public, as a mega-scale infrastructure, and is

available to the general public. (Napanda et al.,

2013) further clarifies that cloud services are

provided on a subscription basis to the public. The

advantages include lower costs, near-unlimited

scalability and high reliability (Burt et al., 2013).

Examples include Amazon (EC2), IBM’s Blue

Cloud, Sun Cloud, Google App Engine and

Windows Azure (Marzantowicz, 2015).

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

39

3.3 Hybrid Cloud

A hybrid cloud model is a mix of two or more cloud

deployment models such as private, public or hybrid

(Sen and Tiwari, 2017; Fehling et al., 2014). This

model requires determining the best split between

the public and private cloud components. The

advantages include control over sensitive data

(private cloud), flexibility, ease of gradual migration

(Burt et al., 2013), and data and application

portability (KPMG, 2018).

3.4 Community Cloud

This model is provisioned for exclusive use by a

particular community of consumers bound by shared

interests (e.g., policy and compliance considerations,

mission and security requirements) and third-party

providers (Gheyas and Abdallah, 2016). A typical

example is the U.S.-based exclusive IBM SoftLayer

cloud which is dedicated for use by federal agencies

only. This approach builds confidence in the

platform, which cloud consumers will use to process

their sensitive workloads (Marzantowicz, 2015).

3.5 Cloud Computing Benefits

Cloud computing has many benefits for the

organizations and these include cost savings,

scalability, anytime anywhere access, use of latest

software versions, energy saving and quick rollout

of business solutions. The general benefits

(Kobielus, 2018) include the following (Lee, 2017):

 free capital expenditure

 accessibility from anywhere at anytime

 no maintenance headaches

 improved control over documents as files will

be centrally managed

 Dynamically scalable

 Device independent

 Instant (Cost-efficient and Task-Centrism)

 Private Server Cost

The NIST Cloud Computing Definition Framework

is shown below on Figure 6.

Figure 6: The NIST Cloud Computing Definition

Framework.

Cloud computing leverages competitive advantage

and provides improved IT capabilities. The Business

benefits of

Cloud Computing include the following:

 Almost zero upfront infrastructure investment

 Just-in-time Infrastructure

 More efficient resource utilization

 Usage-based costing

 Reduced time to market

 Flexibility

 Cost Reduction

 Agility



Automatic software/hardware upgrades

The Technical Benefits of Cloud Computing are:

 Automation – “Scriptable infrastructure”

 Auto-scaling

 Proactive Scaling

 More Efficient Development lifecycle

 Improved Testability

 Disaster Recovery and Business Continuity

However, the major issues of concern and cons on

Cloud Computing include the following:

 Requires a constant internet connection

 Doesn’t work well with low-speed connections

 Can be slower than using desktop software

 Features might be more limited

 Stored data might not be secure

 If the cloud loses your data, big problem

 Privacy

 Security

 Availability

 Legal Issue

 Compliance

 Performance

In conclusion the characteristics of cloud computing

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

40

are leveraged through the following: Massive scale;

Homogeneity; Virtualization; Resilient computing;

Low cost software; Geographic distribution; Service

orientation; and Advanced security technologies.

4 NETWORK FUNCTION

VIRTUALIZATION

Network function virtualization (NFV) is a new

paradigm to design and operate telecommunication

networks. Traditionally, these networks rely on

dedicated hardware-based network equipment and

their functions to provide communication services.

However, this reliance is becoming increasingly

inflexible and inefficient, especially in dealing with

traffic bursts for example during large crowd events.

NFV strives to overcome current limitations by (1)

implementing network functions in software and (2)

deploying them in a virtualized environment. The

resulting virtualized network functions (VNFs)

require a virtual infrastructure that is flexible,

scalable and fault tolerant.

Virtualization is basically making a virtual image

or “version” of something usable on multiple

machines at the same time. Virtualization

technology has the drawbacks of the chance of a

single point of failure of the software achieving the

virtualization and the performance overhead of the

entire system due to virtualization.Virtualization in

general has tremendous advantages. To

accommodate the needs of the industry and

operating environment, to create a more efficient

infrastructure – virtualization process has been

modified as a powerful platform, such that the

process virtualization greatly revolves around one

piece of very important software.

The advantages of virtual machines are as

follows:

 Where the physical hardware is unavailable,

run the operating systems,

 Easier to create new machines, backup

machines, etc.,

 Use of “clean” installs of operating systems

and software for software testing

 Emulate more machines than are physically

available,

 Timeshare lightly loaded systems on one host,

 Debug problems (suspend and resume the

problem machine),

 Easy migration of virtual machines,

 Run legacy systems!

The virtualization process has been modified as a

powerful platform, such that the process

virtualization greatly revolves around one piece of

very important software called a hypervisor. Thus, a

VM must host an OS kernel.

Virtualization: allows the running of multiple

operating systems on a single physical system and

share the underlying hardware resources.

Virtualization entail

s

abstraction and

encapsulation.

However, Clouds rely heavily on virtualization,

whereas Grids do not rely on virtualization as much

as clouds. In Virtualization, a hypervisor is a piece

of computer software that creates and runs virtual

machines.

Instead of installing the operating system as well

as all the necessary software in a virtual machine,

the docker images can be easily built with a

Dockerfifile since the hardware resources, such as

CPU and memory, will be returned to the operating

system immediately. Therefore, many new

applications are programmed into containers.

Cgroups allow system administrators to allocate

resources such as CPU, memory, network, or any

combination of them, to the running containers. This

is illustrated in Figure 7 below.



Figure 7: Architecture comparison of virtual machine Vs

container.

Virtualization is the optimum way to enhance

resource utilization in efficient manner. The core

component of virtualization is Hypervisors. A

Hypervisor is a software which provides isolation

for virtual machines running on top of physical

hosts. The thin layer of software that typically

provides capabilities to virtual parti-tioning that runs

directly on hardware. It provides a potential for

virtual partitioning and responsible for running

multiple kernels on top of the physical host.

Containers are different from Virtualization with

respect to the following aspects:.

1. Simple:- Easy sharing of a hardware resources

clean command line interface, simple REST API.

2. Fast:-Rapid provisioning, instant guest boot, and

no virtualization overhead so as fast as bare metal.

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

41

3.Secure:- Secure by default, combine all available

kernel security feature with AppArmor, user

namespaces, SECCOMP.

4. Scalable:- The quality-of-service may be

broadcast from the from a single container on a

developer laptop to a container per host in a

datacentre. This is also includes remote image

services with Extensible storage and networking.

5. Control groups (cgroups) :- This is a kernel-

provided mechanism for administration, grouping

and tracking through a virtual filesystem.

Docker containers share the operating system and

important resources, such as depending libraries,

drivers or binaries, with its host and therefore they

occupy less physical resources.

5 RESEARCH METHODOLOGY

5.1 Presentation of the Methodology

The Pragmatism paradigm was used in this research

and this is intricately related to the Mixed Methods

Research (MMR) .

5.1.1 Research Approach and Philosophy

The researcher adopts a qualitative approach in form

of focus group discussion to research. Since the

analysis is done to establish differences in data

analytics models for cybersecurity without the

necessity of quantifying the analysis (Iafrate, 2015) .

The researcher adopts a postmodern philosophy

to guide the research. Firstly the researcher notes

that the definition, scope and measurement of

cybersecurity differs between countries and across

nations (Moorthy et al., 2014). Further, the post-

modern view is consistent with descriptive research

designs which seek to interpret situations or models

in their particular contexts (Vadapalli, 2020).

5.1.2 Research Design and Methods

The researcher adopts a descriptive research design

since the intention is to systematically describe the

facts and characteristics of big data analytics models

for cybersecurity. The purpose of the study is

essentially an in-depth description of the models

(Iafrate, 2015).

A case study research method was adopted in

this study. In this respect each data analytics model

for cybersecurity is taken as a separate case to be

investigated in its own separate context (Vadapalli,

2020). Prior research has tended to use case studies

in relation to the study of cybersecurity (Moorthy et

al., 2014). However, the researcher develops a

control case that accounts for an ideal data analytics

model for cybersecurity for comparative purposes.

5.2 Population and Sampling

5.2.1 Population

The research population for the purpose of this study

consists of all data analytics models for

cybersecurity that have been proposed and

developed in literature, journals, conference

proceedings and working papers. This is consistent

with previous research which involves a systematic

review of literature (Gheyas and Abdallah, 2016).

5.2.2 Sample

The researcher identified two data analytics models

or frameworks from a review of literature and the

sample size of 8. Eight participants in total were

interviewed. However, while this may be limited

data, it will be sufficient for the present needs of this

study. Research in future may review more journals

to identify more data analytics models which can be

applied to cybersecurity.

5.3 Sources and Types of Data

The researcher uses secondary data in order to

investigate the application of data analytics models

in cybersecurity.

5.4 Model for Analysis

In analyzing the different data analytics models for

cybersecurity the researcher makes reference to the

characteristics of an ideal data analytics model for

cybersecurity. In constructing an ideal model, the

researcher integrates various literature sources. The

basic framework for big data analytics model for

cybersecurity consists of three major components

which are big data, analytics, and insights (Cox and

Wang, 2014). However, a fourth component may be

identified as prediction (or predictive analytics)

(Gheyas and Abdallah, 2016). This is depicted in

Figure 8 below:

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

42



Figure 8: Big data analytics model for cybersecurity.

5.5 Big Data Analytics

The address the concerns of big data about

cybersecurity, more robust big data analytics models

for cybersecurity have been developed in data

mining techniques and machine learning (Cox and

Wang, 2014). Big data analytics employ data mining

reactors and algorithms, intrusion and malware

detection techniques and vector machine learning

techniques for cybersecurity (Cox and Wang, 2014).

However, it has been observed that adversarial

programs have tended to modify their behavior by

adapting to the reactors and algorithms designed to

detect them (Cox and Wang, 2014). Further,

intrusion detection systems are faced with

challenges such as unbounded patterns, data

nonstationarity, uneven time lags, individuality, high

false alarm rates, and collusion attacks (Gheyas and

Abdallah, 2016). This necessitates a multi-layered

and multi-dimensional approach to big data analytics

for cybersecurity (Hammond, 2015), (Sarker et al.,

2020). In other words an effective big data analytics

model for cybersecurity must be able to detect

intrusions and malware at every layer in the

cybersecurity framework.

5.6 Predictive Analytics

Predictive analytics refer to the application of a big

data analytics model for cybersecurity to derive,

from current cybersecurity data, the likelihood of a

cybersecurity event occurring in future (Gheyas and

Abdallah, 2016). In essence, a data analytics model

for cybersecurity should be able to integrate these

components if it is to be effective in its major

functions of gathering big data about cybersecurity,

analyzing big data about cybersecurity threats,

providing actionable insights and predicting likely

future cybersecurity incidents.

5.7 Validity and Reliability

The researcher solicited comments from peers on the

emerging findings and also feedback to clarify the

biases and assumptions of the researcher to ensure

internal validity of the study (Vadapalli, 2020).

5.8 Possible Outcomes

The expected accuracy rate for the research should

be according to Table 3 below, which shows the

international benchmark.

Table 3: Comparative Detection accuracy rate (%).

Classifier

Detection

Accuracy

(%)

Time taken

to build the

Model in

seconds

False

Alarm

rate (%)

Decision Trees (J48) 81.05 ** **

Naive Bayes 76.56 ** **

Random Forest 80.67 ** **

SVM 69.52 ** **

AdaBoost 90.31 ** 3.38

Mutlinomal Naive Bayes

+N2B

38.89 0.72 27.8

Multinomal Naive Bayes

updateable + N2B

38.94 1.2 27.9

Discriminative

Multinomal Bayes + PCA

94.84 118.36 4.4

Discriminative

Multinomal Bayes + RP

81.47 2.27 12.85

Discriminative

Multinomal Bayes + N2B

96.5 1.11 3.0

6 ANALYSIS AND RESEARCH

OUTCOMES

6.1 Overview

The analysis of possible attacks on an intrusion

network is shown on Figure 9 below.

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

43



Figure 9: Analysis of Attack (Source: (Hashem et al.,

2015)).

(Moorthy et al., 2014) highlighted the use of

Machine Learning (ML), Neural Network and Fuzzy

Logic to detect attacks on private networks on the

different Artificial Intelligence (AI) techniques. It is

not technically feasible to develop a perfect

sophisticated Intrusion Detection System, since the

majority of IDS are signature based.

The IDS is divided into either as a Host IDS

(HIDS) or as a Network IDS (NIDS). Analysis of

the network traffic can be handled by a NIDS which

distinguishes the unlicensed, illegitimate and

anomalous behavior on the network. Packets

traversing through the network should generally be

captured by the IDS using network taps or span port

in order to detect and flag any suspicious activity

(Moorthy et al., 2014)). Anomalous behavior on the

specific device or malicious activity can be

effectively detected by a device specific IDS. The

vulnerability of networks and susceptibility to cyber

attacks is exacerbated by the use of wireless

technology (Hammond, 2015).

The gross inadequacies of classical security

measures have been overtly exposed. Therefore,

effective solutions for a dynamic and adaptive

network defence mechanism should be determined.

Neural networks can provide better solutions for the

representative sets of training data (Hammond,

2015). (Hammond, 2015) argues for the use of ML

classification problems solvable with supervised or

semi-supervised learning models for the majority of

the IDS. However, the one major limitation of the

work done by (Hammond, 2015) is on the

informational structure in cybersecurity for the

analysis of the strategies and the solutions of the

players.

Intrusion attack classification requires

optimization and enhancement of the efficiency of

data mining techniques. The pros and cons of each

algorithm using the NSL-KDD dataset are shown on

Table 4 below.

Table 4: Performance of Support Vector Machines,

Artificial Neural Network, K-Nearest Neighbour, Naive-

Bayes and Decision Tree Algorithms.

An intrusion detection system determines if an

intrusion has occurred, and so monitors computer

systems and networks, and the IDS raises an alert

when necessary (Truong et al., 2020). However,

(Truong et al., 2020) addressed the problems of

Anomaly Based Signature (ABS) which reduces

false positives by allowing a user to interact with the

detection engine and raising classified alerts. The

advantages and disadvantages of ABSs and SBSs are

summarised on table, Table 5, below.

Table 5: Advantages and disadvantages of ABSs and SBSs

models (Source: (Truong et al., 2020)).

Detection

Model

Advantages Disadvantages

Signature-

based

Low false positive

rate

Does not require

training

Classified alerts

Cannot detect new

attacks

Requires continuous

updates

Tuning could be a

thorny tas

k

Anomaly-

based

Can detect new

attacks

Self-learning

Prone to raise false

positives

Black-box approach

Unclassified alerts

Requires initial

trainin

g

An IDS must keep up track of all the data,

networking components and devices involved.

Additional requirements must be met when

developing a cloud-based intrusion detection system

due to its complexity and integrated services.

6.2 Support Vector Machine

Support Vector Machine is a classification artificial

intelligence and machine learning algorithm with a

set containing of points of two types in X

dimensional place. Support vector machine

generates a (X— 1) dimensional hyperplane for

separating these points into two or more groups

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

44

using either linear kernel or non-linear kernel

functions (Menzes et al., 2016). Kernel functions

provides

a method for polynomial, radial and multi-

layer perception classifiers such as classification of

bank performance into four clusters of strong,

satisfactory, moderate and poor performance. The

class of bank performance is defined by the function

Performance class =





wxf .

=







j

jj

wxf

Where x

⃗

is the input vector to the support vector

classifier and w



⃗

is the real vector of weights and f is

the function that translates the dot product of the

input and real vector of weights into desired classes

of bank performance. w



⃗

Is learned from the labeled

training data set.

6.3 KNN Algorithm

The K-NN algorithm is a non-parametric supervised

machine learning technique that endeavors to

classify a data point from given categories with the

support of the training dataset (Menzes et al., 2016).

Predictions are performed for a new object (y) by

searching through the whole training dataset for the

K most similar instances or neighbors. The

algorithm does this by calculating the Euclidean

distance as follows:

  







m

i

ii

yxyxd

1

2

,

Where 𝑑



𝑥, 𝑦



is the distance measure for finding

the similarity between new observations and training

cases and then finding the k-closest instance to the

new instance. Variables are standardized before

calculating the distance since they are measured in

different units. Standardization is performed by the

following function:

d

s

meanX

X

s

.





Where X



is the standardized value, X is the instance

measure, mean and s.d are the mean and standard

deviation of instances. Lower values of K are

sensitive to outliers and higher values are more

resilient to outliers and more voters are considered

to decide the prediction.

6.4 Multi Linear Discriminant Analysis

(LDA)

The Linear Discriminant Analysis is a

dimensionality reduction technique. Dimensionality

reduction is the technique of reducing the amount of

random variables under consideration through

finding a set of principal variables (Menzes et al.,

2016) which is also known as course of

dimensionality. The LDA calculates the separability

between n classes also known as between-class

variance. Let D



be the distance between n classes.





T

i

g

i

iib

xxxxND 



 1

Where 𝑥 the overall is mean, 𝑥



and 𝑁



are the

sample mean and sizes of the respective classes. The

within-class variance is then calculated, which is the

distance between mean and the sample of every

class. Let S



be the within class variance.





2

,

1

,

1

iji

g

t

N

j

ijii

g

t

iy

XXXXSNS

i







The final procedure is to then construct the lower

dimensional space for maximization of the

seperability between classes and the minimization of

within class variance. Let P be the lower

dimensional space.

P = 𝑎𝑟𝑔



max

|







|

|







|

The LDA estimates the probability that a new

instance belongs to every class. Bayes Theorem is

used to estimate the probabilities. For instance, if the

output of the class is (a) and the input is (b) then











 bfaPbfaaPbBxYP |*|/*||

P|a is the prior probability of each class as observed

in the training dataset and f(b) is the estimated

probability of b belonging to the class, f(b) uses the

Gaussian distribution function to determine whether

b belongs to that particular class.

6.5 Random Forest Classifier

The Random Forest classifier is an ensemble

algorithm used for both classification and regression

problems. It creates a set of decision trees from a

randomly selected subset of the training set (Menzes

et al., 2016). The tree with higher error rates are

given low weight in comparison to other trees

increasing the impact of trees with low error rate.

6.6 Variable Importance

Variable importance was implemented using the

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

45

Boruta algorithm to improve model efficiency. The

Boruta algorithm endeavors to internment all the

key, interesting features existing in the dataset with

respect to an outcome variable. The diagram below

shows that net profit is the most significant feature,

followed by ROA, total assets, ROE and other

variables depicted below in Figure 10.



Figure 10: Boruta Algorithm important features.

The next procedure was fitting these variable into

our algorithms and hence evaluating their

performance using the metrics discussed in the

models section. The Boruta algorithm also clusters

banks on important variable as shown below in

Figure 11 for effective risk management and

analysis.



Figure 11: Boruta algorithm clustering banks based on

non-performing loans.

6.7 Model Results

Before we discuss the results of our models. It is

imperative to discuss the distribution of our dataset.

We classify bank performance into four classes

which are strong, satisfactory, moderate and poor

performing banks. A strongly performing bank is the

one with incredible CAMELS indicators. Its

profitability indicators are high, the management

quality is top of the class, less sensitive to market

movements with a high quality asset base. A

satisfactory bank is the one with acceptable but not

outstanding performance.

Our dataset comprises thousands of records from

banking institutions returns. The distribution of

performance classes is shown on the diagram below.

We can see that strong banks comprise of 12.9%,

satisfactory banks 15.1%, moderate banks 47.5%

and poor banks 24.5%. Figure 12 visualizes the

effectiveness of Boruta algorithm in determining the

most important variables that determines the

condition of a bank.

Figure 12: Distribution of the big dataset.

6.8 Classification and Regression Trees

(CART)

Table 6 below shows the performance results of our

CART algorithm in predicting bank failure on the

training set. The algorithm’s level of accuracy on the

training dataset was 82.8%. The best tune or

complexity parameter of our optimal model was

0.068. The Kappa statistic was 75% envisaging that

our classifier was effective as also shown with the

Kappa SD of 0.07 in the classification of bank

categories. On the test dataset, the algorithm

achieved an accuracy level of 92.5% and a kappa of

88.72%. The algorithm only misclassified 2 instance

as moderate and 1 as satisfactory.

Table 6: CART model performance.

Complexity

Parameter

Accuracy Kappa

AccuracyS

D

KappaSD

0.06849315 0.8275092 0.7519499 0.04976459 0.07072572

0.15753425 0.7783150 0.6683229 0.07720896 0.14039942

0.42465753 0.5222344 0.1148591 0.08183351 0.18732422

12,90%

15,10%

47,50%

24,50%

Strong

Satisfactory

Flawed

Poor

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

46

The accuracy of the CART model based on the

complexity parameters of different test runs is

shown on Figure 13 below. The complexity

parameter or the best tune parameter of 0.068

optimized the model performance.

Figure 13: CART accuracy curve.

6.9 Support Vector Machine

The accuracy level of the SVM model on the

training dataset was 79.1% in predicting bank

solvency as shown in table 7. The best tune sigma

and cost values of our highly performing model

where 0.05 and 1 as shown on Figure 14 below. The

Kappa statistic and the Kappa SD where 67.9% and

0.13 respectively. On the test dataset, the algorithm

achieved an accuracy level of 92.5% and a kappa of

88.54%. The algorithm only misclassified 3 instance

as moderate in comparison to the CART algorithm.

Figure 14: SVM accuracy curve.

Table 7: Support Vector Machine performance.

sigma c Accuracy Kappa AccuracySD KappaSD

0.050398 0.25 0.783223 0.678536 0.095598 0.140312

0.050398 0.50 0.776007 0.661354 0.087866 0.132552

0.050398 1.00 0.791391 0.678694 0.080339 0.126466

6.10 Linear Discriminant Algorithm

Table 8: Linear Discriminant algorithm performance.

Accuracy Kappa AccuracySD KappaSD

0.8042399 0.7038131 0.1016816 0.159307

On the training dataset, the LDA achieved an

accuracy level of 80% as in table 8. The Kappa

statistic and the Kappa SD where 70% and 0.16

respectively. On the test dataset, the algorithm

achieved an accuracy level of 90% and a kappa of

84.64%. The algorithm only misclassified 4 instance

as moderate whose performance is poor in

comparison to the CART algorithm.

6.11 K-Nearest Neighbor

Table 9: K-NN algorithm performance.

K Accuracy Kappa AccuracySD KappaSD

5 0.5988645 0.3698931 0.1280376 0.2158109

7 0.6268864 0.4072928 0.1564920 0.2703504

9 0.6621978 0.4715556 0.1747903 0.2881390

The level of accuracy on the training dataset was

66.2%. The best tune parameter for our model was

k=9 or 9 neighbors as shown on the accuracy curve

in Figure 15 below. The Kappa statistic and the

Kappa SD where 47.2% and 0.17 respectively. On

the test dataset, the algorithm achieved an accuracy

level of 67.5% and a kappa of 49%. The algorithm

was not highly effective in classifying bank

performance in comparison to other algorithms.

Figure 15: K-NN confusion accuracy graph.

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

47

6.12 Random Forest

Table 10: Random Forest performance.

mtry Accuracy Kappa AccuracySD KappaSD

2 0.8272527 0.7421420 0.10396454 0.15420079

14 0.8554212 0.7829891 0.06069716 0.09303130

16 0.8482784 0.7718935 0.06455248 0.09881991

On the training set, the accuracy of our random

forest was 85.5% as designated in table 10. The best

tune parameter for our model was the mtry of 14

which is the number of randomly selected predictors

in constructing trees as shown on Figure 16. The

Kappa statistic and the Kappa SD where 78.3% and

0.09 respectively. On the test dataset, the algorithm

achieved an accuracy level of 96% and a kappa of

96%. The algorithm was highly effective in

classifying bank performance in comparison to all

algorithms.

Figure 16: Random forest accuracy graph.

6.13 Challenges and Future Direction

As number of banking activities increase, also

implies that the data submission to the Reserve Bank

continues to grow exponentially. This challenging

situation in combination with advances in machine

learning (ML) and artificial intelligence (AI)

presents unlimited opportunities to apply neural

network-based deep learning (DL) approaches to

predict Zimbabwean Bank’s solvency. Future work

will focus on identifying more features that could

possibly lead to poor bank performance and

incorporate these in our models to develop a robust

early warning supervisory tool based on big data

analytics, machine learning and artificial

intelligence.

The researcher analyses the two models that have

been proposed in literature with reference to an ideal

data analytics model for cybersecurity presented in

section 3.

6.13.1 Model 1: Experimental/ Prototype

Model

In the first case the researcher makes reference to the

model presented in (Petrenko and Makovechuk,

2020) which although developed in the context of

the public sector can be applied to the private sector

organizations. Table 11 below summarizes the main

characteristics of the experimental model.

Software and Hardware Complex (SHC):

Warning-2016

Table 11: Experimental big data analytics model for

cybersecurity.

MODEL

ATTRIBUTES

DESCRIPTION

HBase working on

HDFS (Hadoop

Distributed File

System)

 HBase, a non-relational database,

facilitates analytical and predictive

operations

 Enables users to assess cyber-threats and

the dependability of critical infrastructure

Analytical data

processing module

 Processes large amounts of data, interacts

with standard configurations servers and

is implemented at C language

 Special interactive tools (based on

JavaScript/ CSS/ DHTML) and libraries

(for example jQuery) developed to work

with content of the proper provision of

cybersecurity

Special interactive tools

and libraries

 Interactive tools based on JavaScript/

CSS/ DHTML

 Libraries for example jQuery developed

to work with content for

 Designed to ensure the proper provision

of cybersecurity

Data store for example

(MySQL)

 Percona Server with the ExtraDB engine

 DB servers are integrated into a multi-

master cluster using the Galera Cluster.

Task queues and data

caching

 Redis

Database servers

balancer

 Haproxy

Web server

 nginx, involved PHP-FPM with APC

enabled

HTTP requests balancer

 DNS (Multiple A-records)

Development of special

client applications

running Apple iOS

 Programming languages are used:

Objective C, C++, Apple iOS SDK based

on Cocoa Touch, CoreData, and UIKit.

Development of

applications running

Android OS

 Google SDK

Software development for

the web platform

 PHP and JavaScript.

Speed of the service and

protection from DoS

attacks

 CloudFare (through the use of CDN)

(Source: (Petrenko and Makovechuk, 2020)).

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

48

The proposed model, it is to be noted was

demonstrated to be effective in integrating big data

analytics with cybersecurity in a cost effective way

(Petrenko and Makovechuk, 2020).

6.13.2 Model 2: Cloud

Computing/Outsourcing

The second model involves an organization

outsourcing its data to a cloud computing service

provider. Cloud computing service providers usually

have advanced big data analytics models, with

advanced detection and prediction algorithms and

better state of the art cybersecurity technologies and

better protocols because they specialize in data and

networks. However, it is to be noted that cloud

computing service providers are neither exempt nor

immune from cyber-threats and attacks(Mazumdar

and Wang, 2018).

6.13.3 Application of Big Data Analytics

Models in Cybersecurity

The researcher demonstrated by identifying the

characteristics of an effective data analytics model,

the ideal model, that it is possible to evaluate

different models. While the review of literature

showed that institutions and countries adopt

different big data analytics models for cybersecurity,

the researcher also demonstrated that beside the

unique requirements these models share major

common characteristics for example reactors and

detection algorithms are usually present in every

model but differ in terms of complexity.

The first experimental or prototype model

involves the design, and implementation of a

prototype by an institution and the second model

involves the use serviced provided by cloud

computing companies. By applying such analytics to

big data, valuable information can be extracted and

exploited to enhance decision making and support

informed decisions.

7 CONCLUSION

The main characteristic of Machine Learning is the

automatic data analysis of large data sets and

production of models for the general relationships

found among data. Big data analytics is not only

about the size of data but also clinches on volume,

variety and velocity of data. The information that

was evaluated in Big Data Analytics includes a

mixer of unstructured and semi-structured data, for

instance, social media content, mobile phone records,

web server logs, and internet click stream data.

A Cloud computing setting was added which has

advanced big data analytics models and advanced

detection and prediction algorithms to strengthen the

cybersecurity system. IoT requires both cloud

computing environment to handle its data exchange

and processing; and the use of artificial intelligence

for data mining and data analytics.

Big data analytics makes use of analytic

techniques such as data mining, machine learning,

artificial learning, statistics, and natural language

processing. In an age of transformation and

expansion in the Internet of Things , cloud

computing services and big data, cyber-attacks have

become enhanced and complicated , and therefore

cybersecurity events become difficult or impossible

to detect using traditional detection systems.

As a result, there is an imperative for security

network administrators to be more flexible,

adaptable, and provide robust cyber defense systems

in real-time detection of cyber threats.The key

problem is to evaluate Machine Learning and Big

Data Analytics paradigms for use in Cybersecurity.

The traditional examples of machine learning

algorithms include Linear regression, Logistic

regression, Linear discriminant analysis,

classification and regression trees, Naïve bayes, K-

Nearest Neighbour , Kmeans clustering Learning

Vector Quantization , Support Vector Machines ,

Random Forest, Monte Carlo, Neural networks and

Q-learning.

REFERENCES

Berman, D.S., Buczak, A.L., Chavis, J.S., and Corbett,

C.L. (2019). “Survey of Deep Learning Methods for

Cyber Security”, Information 2019, 10, 122;

doi:10.3390/info10040122.

Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H.,

Watters, P., & Ng, A. (2020). Cybersecurity data

science: an overview from machine learning

perspective. Journal of Big Data.

https://doi.org/10.1186/s40537-020-00318-5

Bringas, P.B., and Santos, I., (2010). Bayesian Networks

for Network Intrusion Detection, Bayesian Network,

Ahmed Rebai (Ed.), ISBN: 978-953-307-124-4,

InTech, Available from: http://www.intechopen.com/

books/bayesian-network/bayesian-networks-for-

network-intrusion-detection

Truong, T.C; Diep, Q.B.; & Zelinka, I. (2020). Artificial

Intelligence in the Cyber Domain: Offense and

Defense. Symmetry 2020, 12, 410.

Stefanova, Z.S., (2018). "Machine Learning Methods for

Network Intrusion Detection and Intrusion Prevention

Performance of Machine Learning and Big Data Analytics Paradigms in Cyber-security and Cloud Computing Platforms

49

Systems", Graduate Theses and Dissertations, 2018,

https://scholarcommons.usf.edu/etd/7367

Proko, E., Hyso, A., and Gjylapi, D. (2018). Machine

Learning Algorithms in Cybersecurity,

http://www.CEURS-WS.org/Vol-2280/paper-32.pdf

Mazumdar, S & Wang J (2018). Big Data and Cyber

security: A visual Analytics perspective in S.

Parkinson et al (Eds), Guide to Vulnerability Analysis

for Computer Networks and Systems.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S.,

Gani, A., & Ullah Khan, S. (2015). The rise of “big

data” on cloud computing: Review and open research

issues. In Information Systems. https://doi.org/

10.1016/j.is.2014.07.006

Siti Nurul Mahfuzah, M., Sazilah, S., & Norasiken, B.

(2017). An Analysis of Gamification Elements in

Online Learning To Enhance Learning Engagement.

6th International Conference on Computing &

Informatics.

Moorthy, M., Baby, R. & Senthamaraiselvi, S., 2014. An

Analysis for Big Data and its Technologies.

International Journal of Computer Science

Engineering and Technology( IJCSET), 4(12), pp.

413-415.

Cox, R. & Wang, G., 2014. Predicting the US bank

failure: A discriminant analysis. Economic Analysis

and Policy, Issue 44.2, pp. 201-211.

Hammond, K., 2015. Practical Artificial Intelligence For

Dummies®, Narrative Science Edition. Hoboken, New

Jersey: John Wiley & Sons, Inc.

Yang, C., Yu, M., Hu, F., Jiang, Y., & Li, Y. (2017).

Utilizing Cloud Computing to address big geospatial

data challenges. Computers, Environment and Urban

Systems.

https://doi.org/10.1016/j.compenvurbsys.2016.10.010

Jiang, W., Wang, L., & Lin, H. (2016). The role of

cognitive processes and individual differences in the

relationship between abusive supervision and

employee career satisfaction. Personality and

Individual Differences. https://doi.org/10.1016/

j.paid.2016.04.088

Fernando, J. I., & Dawson, L. L. (2009). The health

information system security threat lifecycle: An

informatics theory. International Journal of Medical

Informatics.

https://doi.org/10.1016/j.ijmedinf.2009.08.006

Menzes, F.S.D., Liska, G.R., Cirillo, M.A. and Vivanco,

M.J.F. (2016) Data Classification with Binary

Response through the Boosting Algorithm and

Logistic Regression. Expert Systems with

Applications, 69, 62-73. https://doi.org/10.1016/

j.eswa.2016.08.014

Petrenko, S A & Makovechuk K A (2020). Big Data

Technologies for Cybersecurity.

Pense (2014), Pesquisa Nacional de Saude do Escolar, Rio

de Janeiro, RJ - Brazil.

Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., Gao,

M., Hou, H., & Wang, C. (2018). Machine Learning

and Deep Learning Methods for Cybersecurity. IEEE

Access, 6, 35365–35381. https://doi.org/10.1109/

ACCESS.2018.2836950

Umamaheswari, K., and Sujatha, S., (2017). Impregnable

Defence Architecture using Dynamic Correlation-

based Graded Intrusion Detection System for Cloud,

Defence Science Journal, Vol. 67, No. 6, November

2017, pp. 645-653, DOI : 10.14429/dsj.67.11118.

Gheyas, I. A. & Abdallah, A. E. (2016). Detection and

prediction of insider threats to cyber security: A

systematic Literature Review and Meta-Analysis., Big

Data Analytics (2016) 1:6.

Buyya, R., Yeo, C. S., & Venugopal, S. (2008). Market-

oriented cloud computing: Vision, hype, and reality

for delivering IT services as computing utilities.

Proceedings - 10th IEEE International Conference on

High Performance Computing and Communications,

HPCC 2008. https://doi.org/10.1109/HPCC.2008.172

Hadi, J., (2015) ‘Big Data and Five V’S Characteristics’,

International Journal of Advances in Electronics and

Computer Science, (2), pp. 2393–2835.

Suryavanshi, A., (2017), “Magnesium oxide nanoparticle-

loaded polycaprolactone composite electrospun fiber

scaffolds for bone–soft tissue engineering

applications: in-vitro and in-vivo evaluation”, 2017

Biomed. Mater. 12 055011, https://iopscience.iop.org/

article/10.1088/1748-605X/aa792b/pdf

Burt, D., Nicholas, P., Sullivan, K., & Scoles, T. (2013).

Cybersecurity Risk Paradox. Microsoft SIR.

Napanda, K., Shah, H., and Kurup, L., (2015). Artificial

Intelligence Techniques for Network Intrusion

Detection, International Journal of Engineering

Research & Technology (IJERT), ISSN: 2278-0181,

IJERTV4IS110283 www.ijert.org, Vol. 4 Issue 11,

November-2015.

Marzantowicz, (2015), Corporate Social Responsibility of

TSL sector: attitude analysis in the light of research,

„Logistyka” 2014, No. 5, pp. 1773—1785.

Sen and Tiwari, (2017). Port sustainability and stakeholder

management in supply chains: A framework on

resource dependence theory, The Asian Journal of

Shipping and Logistics, No. 28 (3): 301-319.

Fehling, C., Leymann, F., Retter, R., Schupeck, W., &

Arbitter, P. (2014). Cloud Computing Patterns. In

Cloud Computing Patterns. https://doi.org/10.1007/

978-3-7091-1568-8

KPMG (2018) , Clarity on Cybersecurity. Driving growth

with confidence.

Kobielus, J., (2018). Deploying Big Data Analytics

Applica-

tions to the Cloud: Roadmap for Success.

Cloud Standards Customer

Council

Lee, J. (2017). Hacking into China’s cybersecurity law, In:

IEEE International Conference on Distributed

Computing Systems (2017).

Iafrate, F., (2015), From Big Data to Smart Data, ISBN:

978-1-848-21755-3 March, 2015, Wiley-ISTE, 190

Pages.

Pavan Vadapalli, (2020). “AI vs Human Intelligence:

Difference Between AI & Human Intelligence”, 15th

September, 2020, https://www.upgrad.com/blog/ai-vs-

human-intelligence/

Almutairi, A., (2016). Improving intrusion detection

systems using data mining techniques, Ph.D Thesis,

Loughborough University, 2016.

ICICIS 2021 - International Conference on Innovations in Computer and Information Science

50