Homomorphic Encryption at Work for Private Analysis of Security Logs

Aymen Boudguiga

, Oana Stan

, Hichem Sedjelmaci

and Sergiu Carpov

CEA-LIST, 91191, Gif-sur-Yvette, France

Orange Labs, 92320, Châtillon, France

Keywords:

Privacy, Log Management, SIEM, Homomorphic Encryption.

Abstract:

One important component of incident handling in cyber-security is log management. In practice, different

software and/or hardware components of a system such as Intrusion Detection Systems (IDS) or ﬁrewalls

analyze network trafﬁc and log suspicious events or activities. These logs are timestamped, gathered by a log

collector and centralized within a log analyzer. Security Incidents and Events Management (SIEM) system is

an example of a such log analysis tool. SIEM can be a dedicated network device or a Cloud service offered

by a security services provider. Providing SIEM as a cloud service raises privacy issues as logs contain

conﬁdential information that must not be disclosed to third parties. In this work, we investigate the possible

use of homomorphic encryption to provide a privacy preserving log management architecture. We explain

how SIEM can be adapted to treat encrypted logs. In addition, we evaluate the homomorphic classiﬁcation of

IDS alerts from NSL-KDD set with an SVM linear model.

1 INTRODUCTION

Nowadays, Security Incidents and Events Manage-

ment (SIEM) system (Scarfone and Souppaya, 2006)

are established as a reference for cyber-security log

analysis. SIEM can be a dedicated network device

or a Cloud service offered by a security services

provider. A SIEM receives and treats logs in a quasi

real-time (i.e., online) but it may support ofﬂine logs

treatment (Limmer and Dressler, 2008). It is mainly

in charge of logs normalization, analysis, correla-

tion and storage. In practice, SIEM either respond

to detected incidents with appropriate countermea-

sures or generate automated alerts about malicious ac-

tivities for security administrators within a Security

Operations Center (SOC) (Jarpey and McCoy, 2017;

Nathans, 2015).

The SOC provides a dashboard interface for inci-

dents visualization which facilitates their monitoring

by incident analysts. The latter review the SIEM au-

tomated analysis and investigate its alerts with close

scrutiny. They are in charge of attack forensics, logs

storage, distant policy enforcement and incident re-

sponse. That is, SOC deploys security patches and ad-

equate countermeasures on vulnerable devices. SOC

are directed by laws and SOC employees incur jail if

they disclose any information about the analyzed logs

of their clients.

Problem Statement – For accessibility and man-

agement ease, SIEM can be provided by third parties

as distributed services in the Cloud. When an IDS

sends its logs to a remote SIEM to signal an incident

on a device, its logs can be intercepted by a hacker.

The latter can exploit threats related to the reported

incident to attempt an attack on the vulnerable de-

vice. This attack becomes more dangerous when the

reported incident refers to a well-known device that

is deployed in many systems simultaneously. For ex-

ample, an IDS can report the presence of a malware

disclosing users logins and passwords of a database.

If the attacker knows that many companies are using

the same vulnerable database, she can target all these

companies databases with the same malware. Conse-

quently, it becomes compulsory to ensure logs conﬁ-

dentiality and integrity during their transmission to a

third party SIEM. Fortunately, appropriate encryption

and message authentication codes can solve this prob-

lem.

However, when SIEM are managed by third par-

ties that are honest-but-curious, classical encryption

is not enough for preserving logs privacy. Indeed, the

SIEM can decrypt the received logs, analyze them and

send back incident responses to the log generators. In

the meantime, the corrupted SIEMs take advantage

of the decrypted logs to gather information about the

targeted device and its hosting system. These infor-

mation serve to create target proﬁles that interest se-

Boudguiga, A., Stan, O., Sedjelmaci, H. and Carpov, S.

Homomorphic Encryption at Work for Private Analysis of Security Logs.

DOI: 10.5220/0008969205150523

In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 515-523

ISBN: 978-989-758-399-5; ISSN: 2184-4356

515

curity services providers. Or, they can undermine a

company brand image by selling its alert reports to its

concurrent companies. It is in this context that ho-

momorphic encryption is useful to provide logs pri-

vacy and it is an attractive alternative to classical en-

cryption. As such, SIEM will analyze encrypted data

without access to sensitive information about the in-

cident origin and nature.

Homomorphic encryption is also interesting for

forensics use case. Let us consider an incident that

targets a device deployed in many systems simulta-

neously. The log generator encrypts its logs with a

homomorphic scheme to prevent the leakage of sen-

sitive information during the investigation of the in-

cident. As such, an adversary with access to the en-

crypted logs gets no information that will allow her

to run a massive attack on many companies using the

same vulnerable device.

Finally, current regulation efforts regarding pri-

vate data management such as the EU 2016/679 Gen-

eral Data Protection Regulation (GDPR)

(European

Parliament and Council, 2016) are incentives for ho-

momorphic encryption application to logs privacy. In-

deed, GDPR article 4.1. considers logs content as

private data, and logs generators consent is compul-

sory for logs collection and classiﬁcation. However,

GDPR recital 49 indicates that consent is not neces-

sary if data processing is vital to provide system secu-

rity against malicious activities. Fortunately, homo-

morphic encryption is a good compromise between

logs privacy provision and logs processing for mali-

cious activities detection.

Contributions – In this work, we investigate ho-

momorphic encryption usage during log management

to provide private data protection. We ﬁrst character-

ize the sensitive ﬁelds of a log entry that must not be

disclosed to an adversary. Then, we make log gener-

ators transcrypt their logs before their transmission to

log analyzers (i.e., SIEM). The latter processes en-

crypted logs and transmits the result to SOC. The

SOC has only to decrypt the received result to get

SIEM analysis report. Finally, we study the feasibil-

ity of our proposal by simulating a homomorphic log

analysis with NSL-KDD data set.

Paper Organization – Section 2.1 reviews the

main components of a log management architecture.

Section 2.2 presents the related works on privacy-

preserving log management. Section 3 speciﬁes our

privacy-preserving log management protocol that re-

lies on homomorphic encryption. Section 4 discusses

the performance of our proposal. Finally, Section 5

concludes the paper with future improvements.

EU GDPR is applicable since May 2018 in all EU

member states to harmonize privacy laws across Europe.

Internet

Collect

Log

Normalize

Analyze

Correlate

Monitor

Visualize

Investigate

Respond

Store

Remove

Log monitorLog analyzerLog generator

Cloud

Gateway

Device

Communication link

Log generator: IDS, FW...

Log analyzer: SIEM

Log monitor: SOC

Figure 1: Log management components.

2 BACKGROUND AND TOOLS

In this section, we describe the log management ar-

chitecture. In addition, we review the state of the art

on privacy preserving log management. Finally, we

introduce homomorphic encryption and the notations

followed in this work.

2.1 Log Management Architecture

In this section, we review the main components that

take part in log management architecture. To do so,

we consider the abstract network architecture

de-

scribed in Figure 1. It contains a set of heteroge-

neous devices. A device supports several types of

communication: Device-to-Device, Device-to-Cloud

and Device-to-Gateway. Some devices such as intru-

sion detection systems (IDS), ﬁrewalls, gateways or

servers create logs to store information about their in-

ternal state or to monitor their hosting network state.

We refer to these devices as log generators (Scarfone

and Souppaya, 2006). Log generators are synchro-

nized to ensure that the logs they produced are well

timestamped. Indeed, timestamping is compulsory

for logged events reordering during incident investi-

gation.

According to (Allison et al., 2013), the most crit-

ical information for logging are related to: authenti-

cation and authorization, system conﬁguration, net-

work trafﬁc, assets, malware and critical system fail-

ures. In this paper, we only consider alerts and logs

coming from IDS for sake of simplicity. IDS re-

port malicious and suspicious activities detected on

a host machine (HIDS) or a network (NIDS) (Dali

et al., 2015). Note that in practice, IDS implement

Note that the considered architecture covers major IoT

use cases: smart home, smart grids, factory 4.0 and intelli-

gent transportation systems.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

516

a signature-based or/and an anomaly-based detection

policy. Signature-based IDS rely on a set of a prede-

ﬁned rules and patterns to detect misbehaviors. Mean-

while, anomaly-based IDS use machine learning al-

gorithms to detect behavior deviation from a reference

state (Aburomman and Reaz, 2017).

The log analyzer, namely the SIEM can be agent-

less or agent-based (Scarfone and Souppaya, 2006).

Agentless SIEM receives logs from log generators

with no need for the installation of a dedicated log

transfer software in these generators. Meanwhile,

agent-based SIEM relies on clients installed in log

generators for logs transmission periodically. SIEM

is in charge of logs normalization. Indeed, SIEM

manages different types of logs with inconsistencies

regarding logs content, format and timestamps. Nor-

malization is compulsory to rewrite logs with incon-

sistencies under a unique form. Normalization ex-

tracts from logs features such as transport protocol

type (e.g., TCP or UDP), port number, IP addresses

and timestamps. These features are the serialization

tokens of the new log format.

SIEM analyses logs content by looking for pat-

terns of simple attacks using a rule-based or a

learning-based policy. In addition, SIEM correlates

logs in order to reduce the number of false positive,

to remove redundant alerts and to capture linked alerts

that belong to same attack pattern (Beng et al., 2013).

By false positive, we refer to events not relevant for

the analysis or erroneous events signaled as threats.

A simple correlation algorithm consists in clustering

logged events by comparing their features similarities

(e.g., same source IP address and port). Other cor-

relation algorithms rely on machine learning with a

training phase to detect complex alerts.

Furthermore, SIEM is in charge of logs storage

and disposal. Logs can be centralized at one SIEM

or distributed over many SIEMs. In addition, SIEM

can be installed in a local area network or provided

as a Cloud service by a security provider (Figure 1).

SIEM distribution and installation in the cloud is more

advantageous than SIEM local centralization. Indeed,

Cloud SIEM thwarts the risk of logs loss during an

attack against the local network. It backs-up and pro-

tects logs even if their generators are compromised,

their network is attacked and their local logs databases

are cleared.

SIEM interacts with a Security Operations Cen-

ter (SOC), i.e. log monitor (Scarfone and Souppaya,

2006), to deﬁne appropriate reactions to detected in-

cidents on devices. SOC relies on a team of incident

analysts to manage forensics tasks, security counter-

measures and updates deployment, and incident re-

porting. In addition, SOC provides a dashboard for

alert visualization. In this work, we make a clear

distinction between SIEM and SOC functions as pre-

sented in Figure 1. SIEM is a software service which

runs automated processes. Meanwhile, SOC is an in-

frastructure that gathers security experts which are in

charge of security threats investigation and response.

In literature, SIEM and SOC are sometimes grouped

in unique entity (referenced to by SIEM).

2.2 Private Log Analysis

With the democratisation of the cloud and the increase

complexity of the SIEM tools, a natural solution is

to migrate such systems to the cloud. As such, the

analysis of log systems can be deported and available

as a SaaS (Software as a Service) on a remote server

or a Cloud platform (e.g.,(Lin et al., 2013)). In such a

context, there are raising issues regarding the security

of the log transmission, treatment and storage.

Some commercial SIEM solutions such as IBM

QRadar store the received logs obfuscated and

hashed. Other SIEMs allow the log encryption, using

a symmetric scheme such as AES, to secure the data

transmission between their different software compo-

nents. Meanwhile, the approaches based on Syslog

(e.g., syslog-sign, syslog-pseudo, reliable-syslog), the

standard for the network wide logging, does not en-

sure the logs conﬁdentiality in transit or treatment.

In (Ray et al., 2013), the authors propose secure

protocols for the anonymous upload, retrieval and

deletion of logs data over the Tor network. However,

their logging client is dependent on the chosen oper-

ating system and logs privacy is not totally addressed

since the logs can be identiﬁed through their tag val-

ues.

To the best of our knowledge, there are no ex-

isting works on the conﬁdentiality of log data, both

during transmission and analysis, using homomorphic

encryption techniques.

2.3 Homomorphic Encryption

Homomorphic Encryption (HE) schemes allow to

perform computations directly over encrypted data.

That is, with a fully homomorphic encryption scheme

E, we can compute E(m

+ m

) and E(m

× m

) from

encrypted messages E(m

) and E(m

). The ﬁrst con-

structions of HE schemes, allowing either multiplica-

tion or addition over encrypted data date back to the

seventies (Rivest et al., 1978). Then, in 2009, Gen-

try (Gentry et al., 2009) proposed the ﬁrst Fully Ho-

momorphic Encryption (FHE) scheme able to evalu-

ate an arbitrary number of additions and multiplica-

tions over encrypted data.

Homomorphic Encryption at Work for Private Analysis of Security Logs

517

Starting from Gentry breakthrough, many Some-

what HE and FHE schemes have been proposed in lit-

erature (Brakerski et al., 2012; Fan and Vercauteren,

2012a; Van Dijk et al., 2010; López-Alt et al., 2012;

Chillotti et al., 2016; Cheon et al., 2016). In (Acar

et al., 2017), FHE schemes are classiﬁed into four

main families: ideal Lattice-based schemes (Gen-

try et al., 2009), schemes over integers (Van Dijk

et al., 2010), schemes based on the Learning With Er-

ror (LWE) problem or its ring variant (RLWE) (Brak-

erski and Vaikuntanathan, 2011; Brakerski et al.,

2012; Chillotti et al., 2016; Cheon et al., 2016) and

NTRU-like schemes (López-Alt et al., 2012).

In practice, a public key encryption scheme HE =

(HE.Keygen, HE.Enc, HE.Dec, HE.Eval) is deﬁned

by the following probabilistic polynomial-time algo-

rithms with respect to the security parameter k:

• (pk, evk,sk) ← HE.Keygen(1

): outputs an en-

cryption key pk, a public evaluation key evk

and a secret decryption key sk. The evalua-

tion key is used during homomorphic operations.

evk corresponds to the relinearization key in lev-

elled homomorphic schemes such as BFV (Fan

and Vercauteren, 2012a) or to the bootstrap-

ping key in gate boostrapped schemes such as

TFHE (Chillotti et al., 2016).

• c ← HE.Enc

(m): encrypts a message m into a

ciphertext c using the public key pk.

• m ← HE.Dec

(c): decrypts a message c into a

plaintext m using the public key sk.

• c

← HE.Eval(f, c

, . .. , c

): evaluates the func-

tion f on the encrypted inputs c

, . .. , c

using the

evaluation key evk.

Nowadays, we dispose of several FHE schemes

(e.g. BFV, TFHE, CKKS (Cheon et al., 2016), etc.),

which can be mixed together using the CHIMERA

framework (Boura et al., 2018). These scheme have

interesting timing performances which allowed to im-

plement interesting use cases such as evaluating sim-

ple neural networks (Bourse et al., 2017; ?).

As for the overhead induced by the size of the ho-

momorphic ciphertexts during their transmission and

storage, we can use transciphering (Canteaut et al.,

2015). This cryptographic technique changes the data

encryption algorithm from a classical symmetric en-

cryption to a HE scheme, without decrypting the data.

Let m be a plaintext, SYM a symmetric scheme with

key k, SYM.Enc

(m) the encryption of m with SYM,

and HE a homomorphic encryption scheme. With

the transciphering, it is enough to run in homomor-

phic domain the decryption circuit of SYM.Dec using

the homomorphic encryption of the symmetric key

HE.Enc

(k) to obtain the message encrypted with

pk:

HE.Eval

evk

(SYM.Dec

HE.Enc

(k)

(SYM.Enc

(m))) =

HE.Enc

(m)

2.4 Notations

In the following sections, we denote vectors by bold

letters, for example x. Each vector x of n elements

can be represented as: x = (x

, . .. , x

). The transpose

of a vector x is denoted x

. As such the dot product

between two vector x and y is expressed as: < x, y >=

.y.

3 HE-BASED LOG ANALYSIS

In this section, we ﬁrst present our considered threat

model. Then, we discuss the requirements of log

management to support homomorphic encryption op-

erations. Finally, we specify our privacy preserving

scheme for log exchange and treatment.

3.1 Threat Model

In this work, we consider a honest-but-curious model.

In this model, many entities (e

, . .. , e

), having as se-

cret information (s

, . . . , s

), participate to a protocol

P to compute some function F(s

, . .. , s

). Each en-

tity e

i,i∈[1,n]

is honest and must follow each step of P.

However, e

i,i∈[1,n]

is curious. That is, e

i,i∈[1,n]

will try

to ﬁnd information about other entities secrets s

j,j6=i

P is secure in the honest-but-curious model if each

i,i∈[1,n]

has no other information than F(s

, . .. , s

) at

the end of the protocol.

In this work, log generators and the log analyser

(SIEM) are assumed honest-but-curious while the log

monitor (SOC) is assumed a trusted entity. A log gen-

erator may be interested in the features of other log

generators. Meanwhile, the SIEM can be interested in

recovering all the private features of log generators.

The SOC, considered an honest trusted party, gen-

erates its public, evaluation and secret keys (pk, evk

and sk, respectively). Then, the SOC shares pk with

the SIEM and the log generator. In addition, it pro-

vides evk to the SIEM. The latter is in charge of ana-

lyzing the log inputs and returns encrypted evaluation

to the SOC.

We focus here on providing conﬁdential logs anal-

ysis and do not consider other security properties such

as entities authentication or message integrity vali-

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

518

Collect

Log

Normalize

Analyze

Correlate

Monitor

Visualize

Investigate

Respond

Store

Remove

Log monitorLog analyzerLog generator

Transcipher

Figure 2: Log management components update for HE sup-

port.

dation. We consider that "classical"

cryptographic

mechanisms are sufﬁcient to provide such properties.

3.2 Requirements

In this section, we introduce the modiﬁcations needed

for homomorphic encryption usage in our log man-

agement architecture. Our major concern is thwarting

the performance bottleneck of existing homomorphic

encryption schemes, both in terms of memory usage

for log transmission and treatment, as well as exe-

cution time for log processing. To do so, we must

choose with scrutiny the log ﬁelds to be encrypted.

In addition, we pay attention to the operations that

SIEM will run over encrypted data. Figure 2 presents

the proposed extensions to the log management ar-

chitecture. They concern the normalization step and

transciphering of logs data.

First, we delegate SIEM logs normalization step

to log generators. As such, the latter will be in

charge of rewriting their logs in a common format de-

ﬁned by SIEM. Consequently, SIEM is relieved from

the cumbersome normalization of partially encrypted

ﬁle. Second, we propose to encrypt only the sensi-

tive ﬁelds of a log entry to reduce the volume of HE

encrypted data. That is, we distinguish between sen-

sitive ﬁelds and just informative ﬁelds in a log entry.

Sensitive ﬁelds can disclose information about a log

generator such as: MAC address, IP address (if pub-

lic), transport protocols (with their ﬂags) and targeted

services (ports). However, informative ﬁelds do not

provide any information about a log generator, unless

combined with a sensitive ﬁeld. For example, a packet

loss rate is considered as informative ﬁeld. It is valu-

able only when combined with a service port and an

IP address.

Finally, SIEM is transciphering the log ﬁles

l, encrypted partially with a random symmetric

key k, into ﬁles encrypted with SOC homomor-

phic public key HE.pk

SOC

. That is, SIEM is com-

puting HE.Enc

(l) from the symmetric encryption

SYM.Enc

(l) using HE.Enc

(k), sent by a log gener-

ator. Transciphering not only avoids to log generators

the cumbersome homomorphic encryption, but also

By classical, we refer, for example, to message authen-

tication codes to provide message integrity. Or, we can add

nonces to avoid replay attacks and counters to detect denial

of services attacks.

Log

generator

SIEM

SOC

1. k = SYM.Keygen(1

)

2. HE.Enc

(k), SYM.Enc

(l)

3. Transcipher:

HE.Eval

evk

(SYM.Dec

HE.Encpk(k)

(SYM.Enc

(l))

= HE.Enc

(l)

4. Analyse logs:

HE.Eval

evk

(A, HE.Enc

(l))

= HE.Enc

(res)

5. HE.Enc

(res), HE.Enc

(k)

6. HE.Dec

(HE.Enc

(res)) = res

HE.Dec

(HE.Enc

(k)) = k

7. Choose a security countermeasure rec

8. SYM.Enc

(rec)

9. SYM.Dec

(SYM.Enc

(rec)) = rec

1. (pk, evk, sk) = HE.Keygen(1

)

pk, evk

Oﬄine

Online

Figure 3: HE privacy preserving log management protocol.

allows a gain of bandwidth because SYM.Enc

(l) is

smaller than HE.Enc

(l).

3.3 Private Log Analysis with HE

We detail in this section, our proposed protocol for the

privacy protection of logs content. Our main idea is

to make SIEM analyze logs encrypted partially with a

HE scheme. In Figure 3, we show the main three enti-

ties of the log management architecture: the log gen-

erator, the SIEM and the SOC. Log generation step

is carried on by an IDS or a ﬁrewall or any other in-

formation system. The SIEM and the SOC are dis-

tinct entities, hosted on remote servers or provided as

cloud services.

First, the SOC runs the key generation algorithm

(HE.Keygen) and gets pk, evk and sk. Then, it trans-

mits pk to the log generator and the SIEM, and evk

to the SIEM only. The log generator chooses a sym-

metric key k for a homomorphic-friendly symmetric

cyrptosystem (Canteaut et al., 2015). This key can be

renewed periodically.

In step 2, the log generator encrypts this sym-

metric key using the public homomorphic public key

HE.Enc

(k) and also encrypts the generated (and

previously normalized) logs l with the symmetric key

SYM.Enc

(l). The encrypted logs and the homo-

morphically encrypted symmetric key are send to the

SIEM. Note that the transmission of the encrypted key

HE.Enc

(k) is independent of the transmission of the

encrypted logs but must occur before step 3.

In step 3, the SIEM uses HE.Enc

(k) and

SYM.Enc

(l) to obtain the logs HE.Enc

(k) en-

crypted in the homomorphic domain (without having

access at any moment to the clear logs).

Step 4 consists in the analysis of the logs in ho-

momorphic format with the algorithm A. The com-

putation of A can consist of a machine learning al-

gorithm or a rule-based process. The homomorphic

result of this analysis HE.Enc

(r) is send to the SOC

Homomorphic Encryption at Work for Private Analysis of Security Logs

519

in step 5. The protocol also requires the transmission

of HE.Enc

(k) to the SOC.

The SOC decrypts the result of the analysis using

the secret key sk (step 6) and, based on the obtained

result res, sends an adapted security countermeasure

(step 7), noted rec, to the log generator (step 8). The

result can be a classiﬁcation of an event as unknown

or dangerous, and the countermeasure can be a patch

deployment or an alert report. Finally, the log gen-

erator decrypts rec and takes the appropriate actions

(step 9).

4 EXPERIMENTAL RESULTS

We propose to use a Support Vector Machine (SVM)

with a linear kernel for NSL-KDD inputs classiﬁ-

cation using homomorphic encryption. NSL-KDD

dataset (Tavallaee et al., 2009), is an updated ver-

sion of the KDDCUP’99 dataset. It is a public dataset

for testing network-based anomaly detection systems.

We use it to simulate logging inputs from an IDS.

In the following, we review SVM algorithm. Then,

we present the performance of the classiﬁcation of

encrypted NSL-KDD inputs, obtained with the Mi-

crosoft SEAL library version 3.3 (SEAL, 2019).

4.1 Support Vector Machines

Support Vector Machines (SVM) is a supervised

learning algorithm used for regression and classiﬁca-

tion. The SVM classiﬁer determines a set of vectors

called support vectors that serve to construct a hyper-

plan in the feature spaces. In our simulation, we use

SVM as a binary classiﬁer to predict if an NSL-KDD

input comes from a normal trafﬁc or an anomaly.

Given the training dataset (x

, y

)

i∈[[1,n]]

where x

∈

and y

∈ {−1, 1}, we want to ﬁnd with SVM the

hyperplane that maximizes the margin: w

.x + b = 0,

where w is a normal vector (i.e., the hyperplan equa-

tion) and the parameter b is an offset.

We solve the following optimization problem to

ﬁnd the optimal hyperplan:

(

min(

kwk

+ C

∑

i=1

)

s.t. y

+ b) ≥ 1 − ε

, ε

≥ 0 ∀i ∈ [[1, n]]

(1)

∑

i=1

relaxes the constraints on the learning vectors,

and C is a constant that controls the tradeoff between

the number of misclassiﬁcations and the margin max-

imization. Equation 1 can be dealt with the Lagrange

multiplier (Schölkopf and Smola, 2003):

(

max(L(α) =

∑

i=1

−

∑

i=1

∑

j=1

K(x

, x

))

s.t.

∑

i=1

= 0, 0 ≤ α

≤ C ∀i ∈ [[1, n]]

(2)

K is the kernel function and α

i,∀i∈[[1,n]]

are the La-

grange multipliers. According to the condition of

Karush-Kuhn-Tucker (KKT) (Karush, 2014; Kuhn

and Tucker, 1951), the x

that correspond to α

> 0

are called support vectors (SVs).

Once the solution to Equation 2 is found, we get:

w =

∑

i=1

. Thus the decision function can be

written as: sgn(w

.x + b). If the returned sign is pos-

itive than x belongs to the class above the separating

hyperplan.

In this work, we use an SVM classiﬁer with a lin-

ear kernel: K(x

, x

) =< x

, x

>. We do so to simplify

the calculus with vectors of encrypted features as pre-

sented in the upcoming section.

4.2 Performances

Training. We use Python 3, the scikit library

and the SVM available package sklearn.svm for

the dataset preprocessing and for the SVM algorithm

training.

For the training step, we use 80% of the 25192

labelled records of KDDTrain+_20Percent dataset.

Each entry of this dataset is a vector of 42 features.

First, we analyse each feature and its range (mini-

mum, maximum and standard deviation). We also en-

sure that no feature is missing in the dataset. Then,

we convert all categorical attributes into numerical

ones, using label encoding. We scale them using the

MinMaxScaler (except, of course, the class attribute)

of the dataset. Finally, we train an SVM model with a

linear kernel.

Classiﬁcation. Once the training is ﬁnished, we run

the prediction on the test set (i.e. the remaining

20% of KDDTrain+_20Percent). We get an accuracy

value of 95.653%. Accuracy is deﬁned as the ratio be-

tween the predicted values and the real ones. We also

extract the normal vector w and the offset b.

Classiﬁcation of Encrypted Inputs. To classify a

vector x = (x

, . . . , x

) without revealing any x

, we

ﬁrst encrypt it as x

enc

with the API for BFV encryp-

tion scheme (Fan and Vercauteren, 2012b) from SEAL

library.

enc

= (HE.Enc

), . . . , HE.Enc

))

= HE.Enc

(x)

Then, using the linear SVM prediction formula, we

compute res

enc

= w

enc

+ b, where w

and b are

in clear while x

enc

is encrypted. We obtain an

encrypted result res

enc

. Then, we decrypt it as

res = HE.Dec

(res

enc

), and we check the sign of res

to get the ﬁnal prediction result. In practice, x

enc

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

520

100

1 10 100 1000 10000

Classiﬁcation accuracy (in %)

Scaling factor

95.653

95.614

95.554

95.177

92.478

Figure 4: Prediction accuracy with respect to features scal-

ing from reals to integers.

100

200

300

400

500

600

700

800

1 10 100 1000 10000

Elapsed Time (in Milliseconds)

Scaling factor

enc

Figure 5: Prediction time with respect to features scaling

from reals to integers.

corresponds to a private input from a log generator.

Meanwhile, w

is the public data of the SIEM. As

such, SIEM computes res

enc

and sends it to the SOC

for decryption.

In BFV cryptosystem, a plaintext is encoded as a

polynomial of degree n with integer coefﬁcients in Z

Meanwhile, a ciphertext is deﬁned as a set of two (or

more) polynomials of the same degree n with integer

coefﬁcients in Z

where q  t.

To encrypt x with BFV, we compute appropriate

scaling factor s

to transform the real values of w

x and b into integers. To get an integer x

from

a real value x

, we multiply x

by a scaling fac-

tor ranging in {1, 10, 100, 1000, 10000}. Then, we

truncate the integer value using the ﬂoor operator

= bx

c. Finally, we proceed with the classiﬁ-

cation of encrypted inputs. Our experiment conﬁrms

that we reach the same accuracy of a prediction with

real numbers while using encrypted integers starting

from s

= 10000 (as presented in Figure 4).

Figure 5 shows that prediction time for a unique

NSL-KDD input ranges from 0.7ms to 5.5ms with

respect to the scaling factor. Indeed, for each s

{1, 10, 100, 1000, 10000}, each plaintext attribute w

is encoded as a polynomial with integer coefﬁcients

of size dlog

(t)e in {8, 12, 18, 25, 32} (as presented

in Table 1). Meanwhile, each attribute HE.Enc

)

of x

enc

is encoded as a polynomial with integer co-

efﬁcients of size dlog

(q)e. The size of coefﬁcients

impacts the calculus of the dot product between the

ciphertext vector x

enc

and the plaintext vector w

. In-

deed with respect to the same security level (of 128

bits), the size of coefﬁcients (dlog

(q)e) and the de-

gree (n) of the ciphertext polynomial depend on the

size of the coefﬁcents of the plaintext polynomial

(dlog

(t)e) as presented in Table 1. The larger are

the coefﬁcients, the longer is the computation time.

We can make encrypted predictions where not only x

is encrypted but also w

and b too. Computing res

enc

from totally encrypted inputs is interesting when two

SIEM that are managed by different providers, col-

laborate in alerts classiﬁcation. However, these SIEM

can require that their respective model w

remains se-

cret due to IP concerns. In this case, SIEM1 encrypts

and sends w1

enc

and b1

enc

to SIEM2. Then, SIEM2

computes res

enc

= w1

enc

+ b1

enc

Figure 5 shows that the prediction time increases

drastically when all data are encrypted homomorphi-

cally. The prediction time ranges from 55ms to 725ms

with respect to plaintext coefﬁcients size (and so, with

respect to the scaling factor size). In addition, the

plaintext coefﬁcients size impacts the polynomial de-

gree n and the ciphertext modulus q in SEAL. Ta-

ble 1 summarizes the values of n and q with respect

to t while considering a security level of 128 bits.

Note that SEAL ﬁxes the security parameters using the

homomorphicencryption.org security standard.

Table 1: BFV parameters in SEAL.

Prediction equation n dlog

(t)e dlog

(q)e

res

enc

= w

enc

+ b

1024 8 27

2048

res

enc

= w

enc

+ b

enc

2048

4096 18 109

8192

218

5 CONCLUSION

In this work, we speciﬁed a privacy-preserving log

management architecture thanks to the use of homo-

morphic encryption. We made SIEM analyse en-

crypted log inputs using a linear SVM classiﬁer. The

ﬁrst preliminary experimental results, obtained with

Homomorphic Encryption at Work for Private Analysis of Security Logs

521

NSL-KDD dataset and SEAL homomorphic library,

are promising in terms of accuracy.

In the future, we plan to ameliorate the execu-

tion times of the prediction over encrypted data. We

will investigate the use of the plaintext space of BFV

scheme to compute the prediction results of many in-

puts simultaneously. Second, we will use other homo-

morphic encryption schemes, namely TFHE (Chillotti

et al., 2016) or CKKS (Cheon et al., 2016) for our

classiﬁcation. TFHE and CKKS are interesting as

they use ﬂoating point numbers. Second, we intend

to implement and test other classiﬁcation algorithms

(e.g. neural networks) for the analysis of the en-

crypted logs.

REFERENCES

Aburomman, A. A. and Reaz, M. B. I. (2017). A survey

of intrusion detection systems based on ensemble and

hybrid classiﬁers. Computers & Security, 65:135 –

152.

Acar, A., Aksu, H., Uluagac, A. S., and Conti, M.

(2017). A survey on homomorphic encryption

schemes: Theory and implementation. arXiv preprint

arXiv:1704.03578.

Allison, J., Evans, J., Filkens, B., Moye, O., Northcutt, S.,

Read, J., Torres, A., and Wityszyn, M. (2013). The 6

Categories of Critical Log Information.

Beng, L. Y., Ramadass, S., Manickam, S., and Fun, T. S.

(2013). A comparative study of alert correlations

for intrusion detection. In 2013 International Con-

ference on Advanced Computer Science Applications

and Technologies, pages 85–88.

Boura, C., Gama, N., Georgieva, M., and Jetchev, D.

(2018). Chimera: Combining ring-lwe-based fully ho-

momorphic encryption schemes. Cryptology ePrint

Archive, Report 2018/758. https://eprint.iacr.org/

2018/758.

Bourse, F., Minelli, M., Minihold, M., and Paillier, P.

(2017). Fast homomorphic evaluation of deep dis-

cretized neural networks. Cryptology ePrint Archive,

Report 2017/1114. https://eprint.iacr.org/2017/1114.

Brakerski, Z., Gentry, C., and Vaikuntanathan, V. (2012).

(Leveled) Fully Homomorphic Encryption Without

Bootstrapping. In Proceedings of the 3rd Innovations

in Theoretical Computer Science Conference, ITCS

’12, pages 309–325.

Brakerski, Z. and Vaikuntanathan, V. (2011). Fully Homo-

morphic Encryption from Ring-LWE and Security for

Key Dependent Messages. In CRYPTO, volume 6841

of Lecture Notes in Computer Science, pages 505–

524. Springer.

Canteaut, A., Carpov, S., Fontaine, C., Lepoint, T., Naya-

Plasencia, M., Paillier, P., and Sirdey, R. (2015).

Stream ciphers: A practical solution for efﬁcient

homomorphic-ciphertext compression. Cryptology

ePrint Archive, Report 2015/113. https://eprint.iacr.

org/2015/113.

Cheon, J. H., Kim, A., Kim, M., and Song, Y. (2016).

Homomorphic encryption for arithmetic of approxi-

mate numbers. Cryptology ePrint Archive, Report

2016/421. https://eprint.iacr.org/2016/421.

Chillotti, I., Gama, N., Georgieva, M., and Izabachène,

M. (2016). Faster fully homomorphic encryption:

Bootstrapping in less than 0.1 seconds. In Advances

in Cryptology–ASIACRYPT 2016: 22nd International

Conference on the Theory and Application of Cryp-

tology and Information Security, Hanoi, Vietnam, De-

cember 4-8, 2016, Proceedings, Part I 22, pages 3–33.

Springer.

Dali, L., Bentajer, A., Abdelmajid, E., Abouelmehdi, K.,

Elsayed, H., Fatiha, E., and Abderahim, B. (2015).

A survey of intrusion detection system. In 2015 2nd

World Symposium on Web Applications and Network-

ing (WSWAN), pages 1–6.

European Parliament and Council (2016). REGULA-

TION (EU) 2016/679 OF THE EUROPEAN PAR-

LIAMENT AND OF THE COUNCIL of 27 April

2016 on the protection of natural persons with re-

gard to the processing of personal data and on the

free movement of such data, and repealing Directive

95/46/EC (General Data Protection Regulation).

Fan, J. and Vercauteren, F. (2012a). Somewhat practi-

cal fully homomorphic encryption. IACR Cryptology

ePrint Archive, 2012:144.

Fan, J. and Vercauteren, F. (2012b). Somewhat practi-

cal fully homomorphic encryption. Cryptology ePrint

Archive, Report 2012/144. https://eprint.iacr.org/

2012/144.

Gentry, C. et al. (2009). Fully homomorphic encryption

using ideal lattices. In STOC, volume 9, pages 169–

178.

Jarpey, G. and McCoy, R. S. (2017). Chapter 1 - what is a

security operations center? In Jarpey, G., , and Mc-

Coy, R. S., editors, Security Operations Center Guide-

book, pages 3 – 10. Butterworth-Heinemann, Boston.

Karush, W. (2014). Minima of Functions of Several Vari-

ables with Inequalities as Side Conditions, pages 217–

245. Springer Basel, Basel.

Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear pro-

gramming. In Proceedings of the Second Berkeley

Symposium on Mathematical Statistics and Probabil-

ity, pages 481–492, Berkeley, Calif. University of Cal-

ifornia Press.

Limmer, T. and Dressler, F. (2008). Survey of event corre-

lation techniques for attack detection in early warning

systems.

Lin, X., Wang, P., and Wu, B. (2013). Log analysis in cloud

computing environment with hadoop and spark. In

2013 5th IEEE International Conference on Broad-

band Network Multimedia Technology, pages 273–

276.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

522

López-Alt, A., Tromer, E., and Vaikuntanathan, V. (2012).

On-the-ﬂy multiparty computation on the cloud via

multikey fully homomorphic encryption. In Proceed-

ings of the forty-fourth annual ACM symposium on

Theory of computing, pages 1219–1234. ACM.

Nathans, D. (2015). Chapter 1 - efﬁcient operations: Build-

ing an operations center from the ground up. In

Nathans, D., editor, Designing and Building Security

Operations Center, pages 1 – 24. Syngress.

Ray, I., Belyaev, K., Strizhov, M., Mulamba, D., and

Rajaram, M. (2013). Secure logging as a ser-

vice—delegating log management to the cloud. IEEE

Systems Journal, 7(2):323–334.

Rivest, R. L., Adleman, L., and Dertouzos, M. L. (1978).

On data banks and privacy homomorphisms. Founda-

tions of secure computation, 4(11):169–180.

Scarfone, K. K. and Souppaya, M. P. (2006). NIST Spe-

cial Publication - 800-92: Guide to Computer Security

Log Management.

Schölkopf, B. and Smola, A. J. (2003). A Short Introduc-

tion to Learning with Kernels, pages 41–64. Springer

Berlin Heidelberg, Berlin, Heidelberg.

SEAL (2019). Microsoft SEAL (release 3.3). https:

//github.com/Microsoft/SEAL. Microsoft Research,

Redmond, WA.

Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A. A.

(2009). A detailed analysis of the kdd cup 99 data set.

In 2009 IEEE Symposium on Computational Intelli-

gence for Security and Defense Applications, pages

1–6.

Van Dijk, M., Gentry, C., Halevi, S., and Vaikuntanathan, V.

(2010). Fully homomorphic encryption over the inte-

gers. In Annual International Conference on the The-

ory and Applications of Cryptographic Techniques,

pages 24–43. Springer.

Homomorphic Encryption at Work for Private Analysis of Security Logs

523