Homomorphic Encryption at Work for Private Analysis of Security Logs
Aymen Boudguiga
1
, Oana Stan
1
, Hichem Sedjelmaci
2
and Sergiu Carpov
1
1
CEA-LIST, 91191, Gif-sur-Yvette, France
2
Orange Labs, 92320, Châtillon, France
Keywords:
Privacy, Log Management, SIEM, Homomorphic Encryption.
Abstract:
One important component of incident handling in cyber-security is log management. In practice, different
software and/or hardware components of a system such as Intrusion Detection Systems (IDS) or firewalls
analyze network traffic and log suspicious events or activities. These logs are timestamped, gathered by a log
collector and centralized within a log analyzer. Security Incidents and Events Management (SIEM) system is
an example of a such log analysis tool. SIEM can be a dedicated network device or a Cloud service offered
by a security services provider. Providing SIEM as a cloud service raises privacy issues as logs contain
confidential information that must not be disclosed to third parties. In this work, we investigate the possible
use of homomorphic encryption to provide a privacy preserving log management architecture. We explain
how SIEM can be adapted to treat encrypted logs. In addition, we evaluate the homomorphic classification of
IDS alerts from NSL-KDD set with an SVM linear model.
1 INTRODUCTION
Nowadays, Security Incidents and Events Manage-
ment (SIEM) system (Scarfone and Souppaya, 2006)
are established as a reference for cyber-security log
analysis. SIEM can be a dedicated network device
or a Cloud service offered by a security services
provider. A SIEM receives and treats logs in a quasi
real-time (i.e., online) but it may support offline logs
treatment (Limmer and Dressler, 2008). It is mainly
in charge of logs normalization, analysis, correla-
tion and storage. In practice, SIEM either respond
to detected incidents with appropriate countermea-
sures or generate automated alerts about malicious ac-
tivities for security administrators within a Security
Operations Center (SOC) (Jarpey and McCoy, 2017;
Nathans, 2015).
The SOC provides a dashboard interface for inci-
dents visualization which facilitates their monitoring
by incident analysts. The latter review the SIEM au-
tomated analysis and investigate its alerts with close
scrutiny. They are in charge of attack forensics, logs
storage, distant policy enforcement and incident re-
sponse. That is, SOC deploys security patches and ad-
equate countermeasures on vulnerable devices. SOC
are directed by laws and SOC employees incur jail if
they disclose any information about the analyzed logs
of their clients.
Problem Statement – For accessibility and man-
agement ease, SIEM can be provided by third parties
as distributed services in the Cloud. When an IDS
sends its logs to a remote SIEM to signal an incident
on a device, its logs can be intercepted by a hacker.
The latter can exploit threats related to the reported
incident to attempt an attack on the vulnerable de-
vice. This attack becomes more dangerous when the
reported incident refers to a well-known device that
is deployed in many systems simultaneously. For ex-
ample, an IDS can report the presence of a malware
disclosing users logins and passwords of a database.
If the attacker knows that many companies are using
the same vulnerable database, she can target all these
companies databases with the same malware. Conse-
quently, it becomes compulsory to ensure logs confi-
dentiality and integrity during their transmission to a
third party SIEM. Fortunately, appropriate encryption
and message authentication codes can solve this prob-
lem.
However, when SIEM are managed by third par-
ties that are honest-but-curious, classical encryption
is not enough for preserving logs privacy. Indeed, the
SIEM can decrypt the received logs, analyze them and
send back incident responses to the log generators. In
the meantime, the corrupted SIEMs take advantage
of the decrypted logs to gather information about the
targeted device and its hosting system. These infor-
mation serve to create target profiles that interest se-
Boudguiga, A., Stan, O., Sedjelmaci, H. and Carpov, S.
Homomorphic Encryption at Work for Private Analysis of Security Logs.
DOI: 10.5220/0008969205150523
In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 515-523
ISBN: 978-989-758-399-5; ISSN: 2184-4356
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
515
curity services providers. Or, they can undermine a
company brand image by selling its alert reports to its
concurrent companies. It is in this context that ho-
momorphic encryption is useful to provide logs pri-
vacy and it is an attractive alternative to classical en-
cryption. As such, SIEM will analyze encrypted data
without access to sensitive information about the in-
cident origin and nature.
Homomorphic encryption is also interesting for
forensics use case. Let us consider an incident that
targets a device deployed in many systems simulta-
neously. The log generator encrypts its logs with a
homomorphic scheme to prevent the leakage of sen-
sitive information during the investigation of the in-
cident. As such, an adversary with access to the en-
crypted logs gets no information that will allow her
to run a massive attack on many companies using the
same vulnerable device.
Finally, current regulation efforts regarding pri-
vate data management such as the EU 2016/679 Gen-
eral Data Protection Regulation (GDPR)
1
(European
Parliament and Council, 2016) are incentives for ho-
momorphic encryption application to logs privacy. In-
deed, GDPR article 4.1. considers logs content as
private data, and logs generators consent is compul-
sory for logs collection and classification. However,
GDPR recital 49 indicates that consent is not neces-
sary if data processing is vital to provide system secu-
rity against malicious activities. Fortunately, homo-
morphic encryption is a good compromise between
logs privacy provision and logs processing for mali-
cious activities detection.
Contributions In this work, we investigate ho-
momorphic encryption usage during log management
to provide private data protection. We first character-
ize the sensitive fields of a log entry that must not be
disclosed to an adversary. Then, we make log gener-
ators transcrypt their logs before their transmission to
log analyzers (i.e., SIEM). The latter processes en-
crypted logs and transmits the result to SOC. The
SOC has only to decrypt the received result to get
SIEM analysis report. Finally, we study the feasibil-
ity of our proposal by simulating a homomorphic log
analysis with NSL-KDD data set.
Paper Organization Section 2.1 reviews the
main components of a log management architecture.
Section 2.2 presents the related works on privacy-
preserving log management. Section 3 specifies our
privacy-preserving log management protocol that re-
lies on homomorphic encryption. Section 4 discusses
the performance of our proposal. Finally, Section 5
concludes the paper with future improvements.
1
EU GDPR is applicable since May 2018 in all EU
member states to harmonize privacy laws across Europe.
Internet
G
Collect
Log
Normalize
Analyze
Correlate
Monitor
Visualize
Investigate
Respond
Store
Remove
Log monitorLog analyzerLog generator
G
G
Cloud
G
Gateway
Device
Communication link
Log generator: IDS, FW...
Log analyzer: SIEM
Log monitor: SOC
Figure 1: Log management components.
2 BACKGROUND AND TOOLS
In this section, we describe the log management ar-
chitecture. In addition, we review the state of the art
on privacy preserving log management. Finally, we
introduce homomorphic encryption and the notations
followed in this work.
2.1 Log Management Architecture
In this section, we review the main components that
take part in log management architecture. To do so,
we consider the abstract network architecture
2
de-
scribed in Figure 1. It contains a set of heteroge-
neous devices. A device supports several types of
communication: Device-to-Device, Device-to-Cloud
and Device-to-Gateway. Some devices such as intru-
sion detection systems (IDS), firewalls, gateways or
servers create logs to store information about their in-
ternal state or to monitor their hosting network state.
We refer to these devices as log generators (Scarfone
and Souppaya, 2006). Log generators are synchro-
nized to ensure that the logs they produced are well
timestamped. Indeed, timestamping is compulsory
for logged events reordering during incident investi-
gation.
According to (Allison et al., 2013), the most crit-
ical information for logging are related to: authenti-
cation and authorization, system configuration, net-
work traffic, assets, malware and critical system fail-
ures. In this paper, we only consider alerts and logs
coming from IDS for sake of simplicity. IDS re-
port malicious and suspicious activities detected on
a host machine (HIDS) or a network (NIDS) (Dali
et al., 2015). Note that in practice, IDS implement
2
Note that the considered architecture covers major IoT
use cases: smart home, smart grids, factory 4.0 and intelli-
gent transportation systems.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
516
a signature-based or/and an anomaly-based detection
policy. Signature-based IDS rely on a set of a prede-
fined rules and patterns to detect misbehaviors. Mean-
while, anomaly-based IDS use machine learning al-
gorithms to detect behavior deviation from a reference
state (Aburomman and Reaz, 2017).
The log analyzer, namely the SIEM can be agent-
less or agent-based (Scarfone and Souppaya, 2006).
Agentless SIEM receives logs from log generators
with no need for the installation of a dedicated log
transfer software in these generators. Meanwhile,
agent-based SIEM relies on clients installed in log
generators for logs transmission periodically. SIEM
is in charge of logs normalization. Indeed, SIEM
manages different types of logs with inconsistencies
regarding logs content, format and timestamps. Nor-
malization is compulsory to rewrite logs with incon-
sistencies under a unique form. Normalization ex-
tracts from logs features such as transport protocol
type (e.g., TCP or UDP), port number, IP addresses
and timestamps. These features are the serialization
tokens of the new log format.
SIEM analyses logs content by looking for pat-
terns of simple attacks using a rule-based or a
learning-based policy. In addition, SIEM correlates
logs in order to reduce the number of false positive,
to remove redundant alerts and to capture linked alerts
that belong to same attack pattern (Beng et al., 2013).
By false positive, we refer to events not relevant for
the analysis or erroneous events signaled as threats.
A simple correlation algorithm consists in clustering
logged events by comparing their features similarities
(e.g., same source IP address and port). Other cor-
relation algorithms rely on machine learning with a
training phase to detect complex alerts.
Furthermore, SIEM is in charge of logs storage
and disposal. Logs can be centralized at one SIEM
or distributed over many SIEMs. In addition, SIEM
can be installed in a local area network or provided
as a Cloud service by a security provider (Figure 1).
SIEM distribution and installation in the cloud is more
advantageous than SIEM local centralization. Indeed,
Cloud SIEM thwarts the risk of logs loss during an
attack against the local network. It backs-up and pro-
tects logs even if their generators are compromised,
their network is attacked and their local logs databases
are cleared.
SIEM interacts with a Security Operations Cen-
ter (SOC), i.e. log monitor (Scarfone and Souppaya,
2006), to define appropriate reactions to detected in-
cidents on devices. SOC relies on a team of incident
analysts to manage forensics tasks, security counter-
measures and updates deployment, and incident re-
porting. In addition, SOC provides a dashboard for
alert visualization. In this work, we make a clear
distinction between SIEM and SOC functions as pre-
sented in Figure 1. SIEM is a software service which
runs automated processes. Meanwhile, SOC is an in-
frastructure that gathers security experts which are in
charge of security threats investigation and response.
In literature, SIEM and SOC are sometimes grouped
in unique entity (referenced to by SIEM).
2.2 Private Log Analysis
With the democratisation of the cloud and the increase
complexity of the SIEM tools, a natural solution is
to migrate such systems to the cloud. As such, the
analysis of log systems can be deported and available
as a SaaS (Software as a Service) on a remote server
or a Cloud platform (e.g.,(Lin et al., 2013)). In such a
context, there are raising issues regarding the security
of the log transmission, treatment and storage.
Some commercial SIEM solutions such as IBM
QRadar store the received logs obfuscated and
hashed. Other SIEMs allow the log encryption, using
a symmetric scheme such as AES, to secure the data
transmission between their different software compo-
nents. Meanwhile, the approaches based on Syslog
(e.g., syslog-sign, syslog-pseudo, reliable-syslog), the
standard for the network wide logging, does not en-
sure the logs confidentiality in transit or treatment.
In (Ray et al., 2013), the authors propose secure
protocols for the anonymous upload, retrieval and
deletion of logs data over the Tor network. However,
their logging client is dependent on the chosen oper-
ating system and logs privacy is not totally addressed
since the logs can be identified through their tag val-
ues.
To the best of our knowledge, there are no ex-
isting works on the confidentiality of log data, both
during transmission and analysis, using homomorphic
encryption techniques.
2.3 Homomorphic Encryption
Homomorphic Encryption (HE) schemes allow to
perform computations directly over encrypted data.
That is, with a fully homomorphic encryption scheme
E, we can compute E(m
1
+ m
2
) and E(m
1
× m
2
) from
encrypted messages E(m
1
) and E(m
2
). The first con-
structions of HE schemes, allowing either multiplica-
tion or addition over encrypted data date back to the
seventies (Rivest et al., 1978). Then, in 2009, Gen-
try (Gentry et al., 2009) proposed the first Fully Ho-
momorphic Encryption (FHE) scheme able to evalu-
ate an arbitrary number of additions and multiplica-
tions over encrypted data.
Homomorphic Encryption at Work for Private Analysis of Security Logs
517
Starting from Gentry breakthrough, many Some-
what HE and FHE schemes have been proposed in lit-
erature (Brakerski et al., 2012; Fan and Vercauteren,
2012a; Van Dijk et al., 2010; López-Alt et al., 2012;
Chillotti et al., 2016; Cheon et al., 2016). In (Acar
et al., 2017), FHE schemes are classified into four
main families: ideal Lattice-based schemes (Gen-
try et al., 2009), schemes over integers (Van Dijk
et al., 2010), schemes based on the Learning With Er-
ror (LWE) problem or its ring variant (RLWE) (Brak-
erski and Vaikuntanathan, 2011; Brakerski et al.,
2012; Chillotti et al., 2016; Cheon et al., 2016) and
NTRU-like schemes (López-Alt et al., 2012).
In practice, a public key encryption scheme HE =
(HE.Keygen, HE.Enc, HE.Dec, HE.Eval) is defined
by the following probabilistic polynomial-time algo-
rithms with respect to the security parameter k:
(pk, evk,sk) HE.Keygen(1
k
): outputs an en-
cryption key pk, a public evaluation key evk
and a secret decryption key sk. The evalua-
tion key is used during homomorphic operations.
evk corresponds to the relinearization key in lev-
elled homomorphic schemes such as BFV (Fan
and Vercauteren, 2012a) or to the bootstrap-
ping key in gate boostrapped schemes such as
TFHE (Chillotti et al., 2016).
c HE.Enc
pk
(m): encrypts a message m into a
ciphertext c using the public key pk.
m HE.Dec
sk
(c): decrypts a message c into a
plaintext m using the public key sk.
c
f
HE.Eval(f, c
1
, . .. , c
k
): evaluates the func-
tion f on the encrypted inputs c
1
, . .. , c
k
using the
evaluation key evk.
Nowadays, we dispose of several FHE schemes
(e.g. BFV, TFHE, CKKS (Cheon et al., 2016), etc.),
which can be mixed together using the CHIMERA
framework (Boura et al., 2018). These scheme have
interesting timing performances which allowed to im-
plement interesting use cases such as evaluating sim-
ple neural networks (Bourse et al., 2017; ?).
As for the overhead induced by the size of the ho-
momorphic ciphertexts during their transmission and
storage, we can use transciphering (Canteaut et al.,
2015). This cryptographic technique changes the data
encryption algorithm from a classical symmetric en-
cryption to a HE scheme, without decrypting the data.
Let m be a plaintext, SYM a symmetric scheme with
key k, SYM.Enc
k
(m) the encryption of m with SYM,
and HE a homomorphic encryption scheme. With
the transciphering, it is enough to run in homomor-
phic domain the decryption circuit of SYM.Dec using
the homomorphic encryption of the symmetric key
HE.Enc
pk
(k) to obtain the message encrypted with
pk:
HE.Eval
evk
(SYM.Dec
HE.Enc
pk
(k)
(SYM.Enc
k
(m))) =
HE.Enc
pk
(m)
2.4 Notations
In the following sections, we denote vectors by bold
letters, for example x. Each vector x of n elements
can be represented as: x = (x
1
, . .. , x
n
). The transpose
of a vector x is denoted x
t
. As such the dot product
between two vector x and y is expressed as: < x, y >=
x
t
.y.
3 HE-BASED LOG ANALYSIS
In this section, we first present our considered threat
model. Then, we discuss the requirements of log
management to support homomorphic encryption op-
erations. Finally, we specify our privacy preserving
scheme for log exchange and treatment.
3.1 Threat Model
In this work, we consider a honest-but-curious model.
In this model, many entities (e
1
, . .. , e
n
), having as se-
cret information (s
1
, . . . , s
n
), participate to a protocol
P to compute some function F(s
1
, . .. , s
n
). Each en-
tity e
i,i[1,n]
is honest and must follow each step of P.
However, e
i,i[1,n]
is curious. That is, e
i,i[1,n]
will try
to find information about other entities secrets s
j,j6=i
.
P is secure in the honest-but-curious model if each
e
i,i[1,n]
has no other information than F(s
1
, . .. , s
n
) at
the end of the protocol.
In this work, log generators and the log analyser
(SIEM) are assumed honest-but-curious while the log
monitor (SOC) is assumed a trusted entity. A log gen-
erator may be interested in the features of other log
generators. Meanwhile, the SIEM can be interested in
recovering all the private features of log generators.
The SOC, considered an honest trusted party, gen-
erates its public, evaluation and secret keys (pk, evk
and sk, respectively). Then, the SOC shares pk with
the SIEM and the log generator. In addition, it pro-
vides evk to the SIEM. The latter is in charge of ana-
lyzing the log inputs and returns encrypted evaluation
to the SOC.
We focus here on providing confidential logs anal-
ysis and do not consider other security properties such
as entities authentication or message integrity vali-
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
518
Collect
Log
Normalize
Analyze
Correlate
Monitor
Visualize
Investigate
Respond
Store
Remove
Log monitorLog analyzerLog generator
Transcipher
Figure 2: Log management components update for HE sup-
port.
dation. We consider that "classical"
3
cryptographic
mechanisms are sufficient to provide such properties.
3.2 Requirements
In this section, we introduce the modifications needed
for homomorphic encryption usage in our log man-
agement architecture. Our major concern is thwarting
the performance bottleneck of existing homomorphic
encryption schemes, both in terms of memory usage
for log transmission and treatment, as well as exe-
cution time for log processing. To do so, we must
choose with scrutiny the log fields to be encrypted.
In addition, we pay attention to the operations that
SIEM will run over encrypted data. Figure 2 presents
the proposed extensions to the log management ar-
chitecture. They concern the normalization step and
transciphering of logs data.
First, we delegate SIEM logs normalization step
to log generators. As such, the latter will be in
charge of rewriting their logs in a common format de-
fined by SIEM. Consequently, SIEM is relieved from
the cumbersome normalization of partially encrypted
file. Second, we propose to encrypt only the sensi-
tive fields of a log entry to reduce the volume of HE
encrypted data. That is, we distinguish between sen-
sitive fields and just informative fields in a log entry.
Sensitive fields can disclose information about a log
generator such as: MAC address, IP address (if pub-
lic), transport protocols (with their flags) and targeted
services (ports). However, informative fields do not
provide any information about a log generator, unless
combined with a sensitive field. For example, a packet
loss rate is considered as informative field. It is valu-
able only when combined with a service port and an
IP address.
Finally, SIEM is transciphering the log files
l, encrypted partially with a random symmetric
key k, into files encrypted with SOC homomor-
phic public key HE.pk
SOC
. That is, SIEM is com-
puting HE.Enc
pk
(l) from the symmetric encryption
SYM.Enc
k
(l) using HE.Enc
pk
(k), sent by a log gener-
ator. Transciphering not only avoids to log generators
the cumbersome homomorphic encryption, but also
3
By classical, we refer, for example, to message authen-
tication codes to provide message integrity. Or, we can add
nonces to avoid replay attacks and counters to detect denial
of services attacks.
Log
generator
SIEM
SOC
1. k = SYM.Keygen(1
n
)
2. HE.Enc
pk
(k), SYM.Enc
k
(l)
3. Transcipher:
HE.Eval
evk
(SYM.Dec
HE.Encpk(k)
(SYM.Enc
k
(l))
= HE.Enc
pk
(l)
4. Analyse logs:
HE.Eval
evk
(A, HE.Enc
pk
(l))
= HE.Enc
pk
(res)
5. HE.Enc
pk
(res), HE.Enc
pk
(k)
6. HE.Dec
sk
(HE.Enc
pk
(res)) = res
HE.Dec
sk
(HE.Enc
pk
(k)) = k
7. Choose a security countermeasure rec
8. SYM.Enc
k
(rec)
9. SYM.Dec
k
(SYM.Enc
k
(rec)) = rec
1. (pk, evk, sk) = HE.Keygen(1
k
)
pk, evk
pk
Oine
Online
Figure 3: HE privacy preserving log management protocol.
allows a gain of bandwidth because SYM.Enc
k
(l) is
smaller than HE.Enc
pk
(l).
3.3 Private Log Analysis with HE
We detail in this section, our proposed protocol for the
privacy protection of logs content. Our main idea is
to make SIEM analyze logs encrypted partially with a
HE scheme. In Figure 3, we show the main three enti-
ties of the log management architecture: the log gen-
erator, the SIEM and the SOC. Log generation step
is carried on by an IDS or a firewall or any other in-
formation system. The SIEM and the SOC are dis-
tinct entities, hosted on remote servers or provided as
cloud services.
First, the SOC runs the key generation algorithm
(HE.Keygen) and gets pk, evk and sk. Then, it trans-
mits pk to the log generator and the SIEM, and evk
to the SIEM only. The log generator chooses a sym-
metric key k for a homomorphic-friendly symmetric
cyrptosystem (Canteaut et al., 2015). This key can be
renewed periodically.
In step 2, the log generator encrypts this sym-
metric key using the public homomorphic public key
HE.Enc
pk
(k) and also encrypts the generated (and
previously normalized) logs l with the symmetric key
SYM.Enc
k
(l). The encrypted logs and the homo-
morphically encrypted symmetric key are send to the
SIEM. Note that the transmission of the encrypted key
HE.Enc
pk
(k) is independent of the transmission of the
encrypted logs but must occur before step 3.
In step 3, the SIEM uses HE.Enc
pk
(k) and
SYM.Enc
k
(l) to obtain the logs HE.Enc
pk
(k) en-
crypted in the homomorphic domain (without having
access at any moment to the clear logs).
Step 4 consists in the analysis of the logs in ho-
momorphic format with the algorithm A. The com-
putation of A can consist of a machine learning al-
gorithm or a rule-based process. The homomorphic
result of this analysis HE.Enc
pk
(r) is send to the SOC
Homomorphic Encryption at Work for Private Analysis of Security Logs
519
in step 5. The protocol also requires the transmission
of HE.Enc
pk
(k) to the SOC.
The SOC decrypts the result of the analysis using
the secret key sk (step 6) and, based on the obtained
result res, sends an adapted security countermeasure
(step 7), noted rec, to the log generator (step 8). The
result can be a classification of an event as unknown
or dangerous, and the countermeasure can be a patch
deployment or an alert report. Finally, the log gen-
erator decrypts rec and takes the appropriate actions
(step 9).
4 EXPERIMENTAL RESULTS
We propose to use a Support Vector Machine (SVM)
with a linear kernel for NSL-KDD inputs classifi-
cation using homomorphic encryption. NSL-KDD
dataset (Tavallaee et al., 2009), is an updated ver-
sion of the KDDCUP’99 dataset. It is a public dataset
for testing network-based anomaly detection systems.
We use it to simulate logging inputs from an IDS.
In the following, we review SVM algorithm. Then,
we present the performance of the classification of
encrypted NSL-KDD inputs, obtained with the Mi-
crosoft SEAL library version 3.3 (SEAL, 2019).
4.1 Support Vector Machines
Support Vector Machines (SVM) is a supervised
learning algorithm used for regression and classifica-
tion. The SVM classifier determines a set of vectors
called support vectors that serve to construct a hyper-
plan in the feature spaces. In our simulation, we use
SVM as a binary classifier to predict if an NSL-KDD
input comes from a normal traffic or an anomaly.
Given the training dataset (x
i
, y
i
)
i[[1,n]]
where x
i
R
d
and y
i
{−1, 1}, we want to find with SVM the
hyperplane that maximizes the margin: w
t
.x + b = 0,
where w is a normal vector (i.e., the hyperplan equa-
tion) and the parameter b is an offset.
We solve the following optimization problem to
find the optimal hyperplan:
(
min(
kwk
2
2
+ C
n
i=1
ε
i
)
s.t. y
i
(w
t
.x
i
+ b) 1 ε
i
, ε
i
0 i [[1, n]]
(1)
n
i=1
ε
i
relaxes the constraints on the learning vectors,
and C is a constant that controls the tradeoff between
the number of misclassifications and the margin max-
imization. Equation 1 can be dealt with the Lagrange
multiplier (Schölkopf and Smola, 2003):
(
max(L(α) =
n
i=1
α
i
1
2
n
i=1
n
j=1
α
i
α
j
y
i
y
j
K(x
i
, x
j
))
s.t.
n
i=1
α
i
y
i
= 0, 0 α
i
C i [[1, n]]
(2)
K is the kernel function and α
i,i[[1,n]]
are the La-
grange multipliers. According to the condition of
Karush-Kuhn-Tucker (KKT) (Karush, 2014; Kuhn
and Tucker, 1951), the x
i
that correspond to α
i
> 0
are called support vectors (SVs).
Once the solution to Equation 2 is found, we get:
w =
n
i=1
α
i
y
i
x
i
. Thus the decision function can be
written as: sgn(w
t
.x + b). If the returned sign is pos-
itive than x belongs to the class above the separating
hyperplan.
In this work, we use an SVM classifier with a lin-
ear kernel: K(x
i
, x
j
) =< x
i
, x
j
>. We do so to simplify
the calculus with vectors of encrypted features as pre-
sented in the upcoming section.
4.2 Performances
Training. We use Python 3, the scikit library
and the SVM available package sklearn.svm for
the dataset preprocessing and for the SVM algorithm
training.
For the training step, we use 80% of the 25192
labelled records of KDDTrain+_20Percent dataset.
Each entry of this dataset is a vector of 42 features.
First, we analyse each feature and its range (mini-
mum, maximum and standard deviation). We also en-
sure that no feature is missing in the dataset. Then,
we convert all categorical attributes into numerical
ones, using label encoding. We scale them using the
MinMaxScaler (except, of course, the class attribute)
of the dataset. Finally, we train an SVM model with a
linear kernel.
Classification. Once the training is finished, we run
the prediction on the test set (i.e. the remaining
20% of KDDTrain+_20Percent). We get an accuracy
value of 95.653%. Accuracy is defined as the ratio be-
tween the predicted values and the real ones. We also
extract the normal vector w and the offset b.
Classification of Encrypted Inputs. To classify a
vector x = (x
1
, . . . , x
n
) without revealing any x
i
, we
first encrypt it as x
enc
with the API for BFV encryp-
tion scheme (Fan and Vercauteren, 2012b) from SEAL
library.
x
enc
= (HE.Enc
pk
(x
1
), . . . , HE.Enc
pk
(x
n
))
= HE.Enc
pk
(x)
Then, using the linear SVM prediction formula, we
compute res
enc
= w
t
.x
enc
+ b, where w
t
and b are
in clear while x
enc
is encrypted. We obtain an
encrypted result res
enc
. Then, we decrypt it as
res = HE.Dec
sk
(res
enc
), and we check the sign of res
to get the final prediction result. In practice, x
enc
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
520
86
88
90
92
94
96
98
100
1 10 100 1000 10000
Classication accuracy (in %)
Scaling factor
95.653
95.614
95.554
95.177
92.478
Figure 4: Prediction accuracy with respect to features scal-
ing from reals to integers.
0
100
200
300
400
500
600
700
800
1 10 100 1000 10000
Elapsed Time (in Milliseconds)
Scaling factor
w
t
.x
enc
+b
w
t
enc
.x
enc
+b
enc
Figure 5: Prediction time with respect to features scaling
from reals to integers.
corresponds to a private input from a log generator.
Meanwhile, w
t
is the public data of the SIEM. As
such, SIEM computes res
enc
and sends it to the SOC
for decryption.
In BFV cryptosystem, a plaintext is encoded as a
polynomial of degree n with integer coefficients in Z
t
.
Meanwhile, a ciphertext is defined as a set of two (or
more) polynomials of the same degree n with integer
coefficients in Z
q
where q t.
To encrypt x with BFV, we compute appropriate
scaling factor s
f
to transform the real values of w
t
,
x and b into integers. To get an integer x
Z
from
a real value x
R
, we multiply x
R
by a scaling fac-
tor ranging in {1, 10, 100, 1000, 10000}. Then, we
truncate the integer value using the floor operator
x
Z
= bx
R
.s
f
c. Finally, we proceed with the classifi-
cation of encrypted inputs. Our experiment confirms
that we reach the same accuracy of a prediction with
real numbers while using encrypted integers starting
from s
f
= 10000 (as presented in Figure 4).
Figure 5 shows that prediction time for a unique
NSL-KDD input ranges from 0.7ms to 5.5ms with
respect to the scaling factor. Indeed, for each s
f
in
{1, 10, 100, 1000, 10000}, each plaintext attribute w
t
i
is encoded as a polynomial with integer coefficients
of size dlog
2
(t)e in {8, 12, 18, 25, 32} (as presented
in Table 1). Meanwhile, each attribute HE.Enc
pk
(x
i
)
of x
enc
is encoded as a polynomial with integer co-
efficients of size dlog
2
(q)e. The size of coefficients
impacts the calculus of the dot product between the
ciphertext vector x
enc
and the plaintext vector w
t
. In-
deed with respect to the same security level (of 128
bits), the size of coefficients (dlog
2
(q)e) and the de-
gree (n) of the ciphertext polynomial depend on the
size of the coefficents of the plaintext polynomial
(dlog
2
(t)e) as presented in Table 1. The larger are
the coefficients, the longer is the computation time.
We can make encrypted predictions where not only x
is encrypted but also w
t
and b too. Computing res
enc
from totally encrypted inputs is interesting when two
SIEM that are managed by different providers, col-
laborate in alerts classification. However, these SIEM
can require that their respective model w
t
remains se-
cret due to IP concerns. In this case, SIEM1 encrypts
and sends w1
t
enc
and b1
enc
to SIEM2. Then, SIEM2
computes res
enc
= w1
t
enc
.x
enc
+ b1
enc
.
Figure 5 shows that the prediction time increases
drastically when all data are encrypted homomorphi-
cally. The prediction time ranges from 55ms to 725ms
with respect to plaintext coefficients size (and so, with
respect to the scaling factor size). In addition, the
plaintext coefficients size impacts the polynomial de-
gree n and the ciphertext modulus q in SEAL. Ta-
ble 1 summarizes the values of n and q with respect
to t while considering a security level of 128 bits.
Note that SEAL fixes the security parameters using the
homomorphicencryption.org security standard.
Table 1: BFV parameters in SEAL.
Prediction equation n dlog
2
(t)e dlog
2
(q)e
res
enc
= w
t
.x
enc
+ b
1024 8 27
2048
12
54
18
25
32
res
enc
= w
t
enc
.x
enc
+ b
enc
2048
8
54
12
4096 18 109
8192
25
218
32
5 CONCLUSION
In this work, we specified a privacy-preserving log
management architecture thanks to the use of homo-
morphic encryption. We made SIEM analyse en-
crypted log inputs using a linear SVM classifier. The
first preliminary experimental results, obtained with
Homomorphic Encryption at Work for Private Analysis of Security Logs
521
NSL-KDD dataset and SEAL homomorphic library,
are promising in terms of accuracy.
In the future, we plan to ameliorate the execu-
tion times of the prediction over encrypted data. We
will investigate the use of the plaintext space of BFV
scheme to compute the prediction results of many in-
puts simultaneously. Second, we will use other homo-
morphic encryption schemes, namely TFHE (Chillotti
et al., 2016) or CKKS (Cheon et al., 2016) for our
classification. TFHE and CKKS are interesting as
they use floating point numbers. Second, we intend
to implement and test other classification algorithms
(e.g. neural networks) for the analysis of the en-
crypted logs.
REFERENCES
Aburomman, A. A. and Reaz, M. B. I. (2017). A survey
of intrusion detection systems based on ensemble and
hybrid classifiers. Computers & Security, 65:135
152.
Acar, A., Aksu, H., Uluagac, A. S., and Conti, M.
(2017). A survey on homomorphic encryption
schemes: Theory and implementation. arXiv preprint
arXiv:1704.03578.
Allison, J., Evans, J., Filkens, B., Moye, O., Northcutt, S.,
Read, J., Torres, A., and Wityszyn, M. (2013). The 6
Categories of Critical Log Information.
Beng, L. Y., Ramadass, S., Manickam, S., and Fun, T. S.
(2013). A comparative study of alert correlations
for intrusion detection. In 2013 International Con-
ference on Advanced Computer Science Applications
and Technologies, pages 85–88.
Boura, C., Gama, N., Georgieva, M., and Jetchev, D.
(2018). Chimera: Combining ring-lwe-based fully ho-
momorphic encryption schemes. Cryptology ePrint
Archive, Report 2018/758. https://eprint.iacr.org/
2018/758.
Bourse, F., Minelli, M., Minihold, M., and Paillier, P.
(2017). Fast homomorphic evaluation of deep dis-
cretized neural networks. Cryptology ePrint Archive,
Report 2017/1114. https://eprint.iacr.org/2017/1114.
Brakerski, Z., Gentry, C., and Vaikuntanathan, V. (2012).
(Leveled) Fully Homomorphic Encryption Without
Bootstrapping. In Proceedings of the 3rd Innovations
in Theoretical Computer Science Conference, ITCS
’12, pages 309–325.
Brakerski, Z. and Vaikuntanathan, V. (2011). Fully Homo-
morphic Encryption from Ring-LWE and Security for
Key Dependent Messages. In CRYPTO, volume 6841
of Lecture Notes in Computer Science, pages 505–
524. Springer.
Canteaut, A., Carpov, S., Fontaine, C., Lepoint, T., Naya-
Plasencia, M., Paillier, P., and Sirdey, R. (2015).
Stream ciphers: A practical solution for efficient
homomorphic-ciphertext compression. Cryptology
ePrint Archive, Report 2015/113. https://eprint.iacr.
org/2015/113.
Cheon, J. H., Kim, A., Kim, M., and Song, Y. (2016).
Homomorphic encryption for arithmetic of approxi-
mate numbers. Cryptology ePrint Archive, Report
2016/421. https://eprint.iacr.org/2016/421.
Chillotti, I., Gama, N., Georgieva, M., and Izabachène,
M. (2016). Faster fully homomorphic encryption:
Bootstrapping in less than 0.1 seconds. In Advances
in Cryptology–ASIACRYPT 2016: 22nd International
Conference on the Theory and Application of Cryp-
tology and Information Security, Hanoi, Vietnam, De-
cember 4-8, 2016, Proceedings, Part I 22, pages 3–33.
Springer.
Dali, L., Bentajer, A., Abdelmajid, E., Abouelmehdi, K.,
Elsayed, H., Fatiha, E., and Abderahim, B. (2015).
A survey of intrusion detection system. In 2015 2nd
World Symposium on Web Applications and Network-
ing (WSWAN), pages 1–6.
European Parliament and Council (2016). REGULA-
TION (EU) 2016/679 OF THE EUROPEAN PAR-
LIAMENT AND OF THE COUNCIL of 27 April
2016 on the protection of natural persons with re-
gard to the processing of personal data and on the
free movement of such data, and repealing Directive
95/46/EC (General Data Protection Regulation).
Fan, J. and Vercauteren, F. (2012a). Somewhat practi-
cal fully homomorphic encryption. IACR Cryptology
ePrint Archive, 2012:144.
Fan, J. and Vercauteren, F. (2012b). Somewhat practi-
cal fully homomorphic encryption. Cryptology ePrint
Archive, Report 2012/144. https://eprint.iacr.org/
2012/144.
Gentry, C. et al. (2009). Fully homomorphic encryption
using ideal lattices. In STOC, volume 9, pages 169–
178.
Jarpey, G. and McCoy, R. S. (2017). Chapter 1 - what is a
security operations center? In Jarpey, G., , and Mc-
Coy, R. S., editors, Security Operations Center Guide-
book, pages 3 – 10. Butterworth-Heinemann, Boston.
Karush, W. (2014). Minima of Functions of Several Vari-
ables with Inequalities as Side Conditions, pages 217–
245. Springer Basel, Basel.
Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear pro-
gramming. In Proceedings of the Second Berkeley
Symposium on Mathematical Statistics and Probabil-
ity, pages 481–492, Berkeley, Calif. University of Cal-
ifornia Press.
Limmer, T. and Dressler, F. (2008). Survey of event corre-
lation techniques for attack detection in early warning
systems.
Lin, X., Wang, P., and Wu, B. (2013). Log analysis in cloud
computing environment with hadoop and spark. In
2013 5th IEEE International Conference on Broad-
band Network Multimedia Technology, pages 273–
276.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
522
López-Alt, A., Tromer, E., and Vaikuntanathan, V. (2012).
On-the-fly multiparty computation on the cloud via
multikey fully homomorphic encryption. In Proceed-
ings of the forty-fourth annual ACM symposium on
Theory of computing, pages 1219–1234. ACM.
Nathans, D. (2015). Chapter 1 - efficient operations: Build-
ing an operations center from the ground up. In
Nathans, D., editor, Designing and Building Security
Operations Center, pages 1 – 24. Syngress.
Ray, I., Belyaev, K., Strizhov, M., Mulamba, D., and
Rajaram, M. (2013). Secure logging as a ser-
vice—delegating log management to the cloud. IEEE
Systems Journal, 7(2):323–334.
Rivest, R. L., Adleman, L., and Dertouzos, M. L. (1978).
On data banks and privacy homomorphisms. Founda-
tions of secure computation, 4(11):169–180.
Scarfone, K. K. and Souppaya, M. P. (2006). NIST Spe-
cial Publication - 800-92: Guide to Computer Security
Log Management.
Schölkopf, B. and Smola, A. J. (2003). A Short Introduc-
tion to Learning with Kernels, pages 41–64. Springer
Berlin Heidelberg, Berlin, Heidelberg.
SEAL (2019). Microsoft SEAL (release 3.3). https:
//github.com/Microsoft/SEAL. Microsoft Research,
Redmond, WA.
Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A. A.
(2009). A detailed analysis of the kdd cup 99 data set.
In 2009 IEEE Symposium on Computational Intelli-
gence for Security and Defense Applications, pages
1–6.
Van Dijk, M., Gentry, C., Halevi, S., and Vaikuntanathan, V.
(2010). Fully homomorphic encryption over the inte-
gers. In Annual International Conference on the The-
ory and Applications of Cryptographic Techniques,
pages 24–43. Springer.
Homomorphic Encryption at Work for Private Analysis of Security Logs
523