Identifying Users’ Emotional States through Keystroke Dynamics

Stefano Marrone

and Carlo Sansone

Department of Information Technology and Electrical Engineering,

University of Naples Federico II, Via Claudio, 21, Napoli, Italy

Keywords:

Cyberbullying, Keystroke Dynamics, Emotion Recognition, Deep Learning.

Abstract:

Recognising users’ emotional states is among the most pursued tasks in the ﬁeld of affective computing.

Despite several works show promising results, they usually require expensive or intrusive hardware. Keystroke

Dynamics (KD) is a behavioural biometric, whose typical aim is to identify or conﬁrm the identity of an

individual by analysing habitual rhythm patterns as they type on a keyboard. This work focuses on the use of

KD as a way to continuously predict users’ emotional states during message writing sessions. In particular,

we introduce a time-windowing approach that allows analysing users’ writing sessions in different batches,

even when the considered writing window is relatively small. This is very relevant in the ﬁeld of social media,

where the exchanged messages are usually very small and the typing rhythm is very fast. The obtained results

suggest that even very short writing windows (in the order of 30”) are sufﬁcient to recognise the subject’s

emotional state with the same level of accuracy of systems based on the analysis of larger writing sessions

(i.e., up to a few minutes).

1 INTRODUCTION

Emotions play a fundamental role in human life, in-

ﬂuencing the mental and physiological processes of

our species. Emotions can be deﬁned as complex re-

sponse conﬁgurations, selected during the course of

evolution to favour the adaptation of the organism to

the environment, from which stimuli or representa-

tions are received that upset its equilibrium. As re-

sponse mechanisms, emotions often involve similar

neurophysiological and biochemical modiﬁcations,

assuming a social and relational signiﬁcance within

the species. In other cases, however, they may man-

ifest themselves differently, modulating according to

the subjective experiences of each individual.

Affective computing, sometimes also referred to

as Artiﬁcial Emotional Intelligence, is the branch of

Artiﬁcial Intelligence (AI) that develops technologies

able to recognise and express emotions (Tao and Tan,

2005). In virtue of this new perspective where classic

AI is integrated with emotional intelligence, we now

speak of emotional AI or the combination of emo-

tional and artiﬁcial intelligence. Advances in affec-

tive computing technology have led to the growth of

emotion recognition research in recent years. Sys-

https://orcid.org/0000-0001-6852-0377

https://orcid.org/0000-0002-8176-6950

tems able to perceive emotions bring multiple bene-

ﬁts to their users, as they are useful both to the user,

who becomes more aware of the emotions he or she

is showing, and to developers, who can make use

of emotion recognition to make their projects adap-

tive to the user’s experience, as well as to support

the detection of cognitive disorders, anxiety, or stress.

The latter can be extremely useful in the detection of

(cyber)bullying, a situation in which negative emo-

tional situations can affect the mental health of (usu-

ally young) subjects (Sansone and Sperl

ı, 2021). As

a consequence, several approaches have been devel-

oped for the automatic detection of emotions, for ex-

ample by conducting voice intonation analysis, facial

expression analysis or using physiological sensors.

Yet, they usually required expensive, intrusive or hard

to use hardware (Fragopanagos and Taylor, 2005).

Biometrics is a term referring to body measure-

ments and statistical analyses intended to extract

and quantify human characteristics. This technol-

ogy, mostly used for users’ authentication or iden-

tiﬁcation purposes (Jain et al., 2000), has increas-

ingly been used for other aims, including entertain-

ment and user-experience personalization (Mandryk

and Nacke, 2016). Biometric approaches can be

grouped into two distinct categories, based on the type

of unique characteristic they try to leverage:

Marrone, S. and Sansone, C.

Identifying Users’ Emotional States through Keystroke Dynamics.

DOI: 10.5220/0011367300003277

In Proceedings of the 3rd International Conference on Deep Learning Theory and Applications (DeLTA 2022), pages 207-214

ISBN: 978-989-758-584-5; ISSN: 2184-9277

207

• Physiological, referring to a direct physical mea-

sure of some human body parts, such as the face,

ﬁngerprint, iris, retina, voice, etc;

• Behavioural, referring to speciﬁc behaviours of a

human while performing an action, such as hand-

writing, typing, speaking, and so on.

Among all, keystroke dynamics is considered one of

the most effective and cheap (i.e., easy to implement

using already available hardware) behavioural bio-

metrics. In recent years, it has been more and more

used to enforce user authentication by analysing ha-

bitual rhythm patterns as they types on a keyboard

(both physical or virtual), so that a compromised pass-

word will not necessarily result in a compromised sys-

tem (Karnan et al., 2011).

In this work, we instead focus on the use of

keystroke dynamics for user emotions recognition.

We believe that it could become the cheapest and

most available method for emotions recognition, as

the only hardware it requires is a common keyboard.

Additionally, a keystroke recorder can be either hard-

ware or software, with the latter approach being very

unobtrusive, so that a person using the keyboard is

unaware that their actions are being monitored re-

sulting in an unbiased typing rhythm. In particular,

we introduce a time-windowing approach that allows

analysing users’ writing sessions in different batches,

even when the considered writing window is rela-

tively small. This is very relevant in the ﬁeld of so-

cial media, where the exchanged messages are usually

very small and the typing rhythm is very fast.

The rest of the paper is organised as follows: Sec-

tion 2 presents the EmoSurv dataset as well as the ap-

proach used for recognising emotions on short writ-

ing windows; Section 3 shows the experimental setup,

while the obtained results are reported in Section 4.

Finally, Section 5 draws some conclusions and reports

future works.

2 THE EmoSurv DATASET

EmoSurv(Maalej and Kallel, 2020) is a recent dataset

containing keystroke data for 124 subjects along

with the associated emotion labels, grouped into ﬁve

classes: Anger, Happiness, Calmness, Sadness, and

Neutral State. Timing and frequency data were

recorded while participants were typing free and ﬁxed

texts before and after a speciﬁc emotion was induced

through the visualisation of a video on an interac-

tive web application

. To perform the data collection

www.emosurv.tech

Figure 1: The EmoSurv ﬁxed texts, by emotion.

process, the application guides each participant to go

through the following tasks:

1. The subject has to answer a list of questions

about some demographic characteristics such as

age range, sex and number of ﬁngers he uses to

type. The answers are stored in a table;

2. The subject has to type a free and a ﬁxed text be-

fore the emotion the induction process, assuming

that they are in their neutral state;

3. A speciﬁc emotion-eliciting video is shown;

4. When the subject is done visualising the whole

video, they are asked to answer some focus-

checking questions to make sure they watched the

entire video and were not distracted. If the an-

swers are wrong, the relative data are discarded;

5. The participant is asked to type a free and a ﬁxed

text just after ﬁnishing the video;

6. Finally the subject can choose to leave the appli-

cation, or continue the data acquisition process

and watch another video to experience a different

emotional state.

The dataset also comes with some pre-extracted fea-

tures, based on digraphs and trigraphs, namely com-

bination of two or three consecutive keystroke events.

The data is organised into four .csv ﬁles:

• The Fixed Text Typing Dataset was collected

while the participants were typing a ﬁxed text

(e.g., copying a prompted message) and it in-

cludes features such as the user id, the emotion in-

dex (e.g., ‘H’ for Happy, ‘S’ for Sad, etc.), the spe-

ciﬁc pressed key, the answer to the focus-related

question and seven features associated with keys

press-release combinations and timing;

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

208

Table 1: Number of sentences (#Sen), of characters (#Char)

and average writing time (AvgT, in seconds, over all the

recorded sessions) for the ﬁxed-text sentences.

Emotion #Sen #Char AvgT

Angry 24 4158 59

Calm 31 5156 68

Happy 36 4514 52

Neutral 116 26762 110

Sad 32 4793 60

• The Free Text Typing Dataset was collected

while the participants were typing a free text

and includes the same features as the Fixed Text

Dataset;

• The Frequency Dataset includes frequency re-

lated features, such as the relative frequency of the

delete and backspace key, and the time required to

write the sentence;

• The Participants Information Dataset includes

demographic information such as gender, age

range, status, country, etc. It also contains infor-

mation about the writing style, such as whether

the participant types with one hand or two hands,

using one or more ﬁngers.

For each of the ﬁve classiﬁable emotions there is

a corresponding emotion-inducing video and a ﬁxed

sentence the participant is asked to type after watch-

ing the video. These ﬁxed texts are inherent to the

emotion inducted and are of different lengths (Fig. 1).

Each subject has at least typed the sentence relating

to the neutral emotion, but not all subjects have typed

the sentences relating to the remaining four emotions.

This depends on which video was shown to a spe-

ciﬁc subject, and on how many times he decided to

repeat the data collection process. It can be easily

noted how the sentence related to the Neutral emotion

is the longest one among all of the sentences. Table 1

reports the number of sentences and of samples (char-

acters), as well as the average duration, in seconds, of

the typing sessions registered for each emotion.

2.1 Data Pre-processing

By analysing the dataset, it was found that the num-

ber of unique userId was equal to 83, contrary to the

124 declared in the documentation. However, it was

noted that some ids were repeated multiple times in

the dataset, meaning that they were assigned to more

than one unique data acquisition session. For exam-

ple, the user whose id is 93 was assigned to four dif-

ferent typing sessions. This was interpreted as a mis-

take in the data registration. Thus, the “UserId” col-

umn has been modiﬁed to make sure that subsequent

sessions with repeated userId were assigned a differ-

ent and fresh id. The same change was carefully ap-

plied in the Participant Information Dataset as well.

As a result of this operation, the number of userIds

increased from 83 to 116 (still less than the 124 de-

clared in the documentation). Also, we removed all

the instances (rows) presenting an erroneous value

(−1.58∗10

) in any of the available columns or with

a NaN for the “D1U1” feature (for the other features,

NaN is allowed). After these operations, the number

of characters of the free text dataset is reduced from

46871 to 45358.

2.2 Feature Extraction

The features already made available with the dataset

are related to a single keystroke (D1U1), digraphs

(D1D2, U1D2, D2, D3) and trigraphs (D1D3, D1U3).

These features may not be suited for emotion recog-

nition as they are extremely local. Instead, we believe

that studying the typing rhythm of the user over a cer-

tain interval of time could result in a better perfor-

mance. Thus, in this work we leverage 20 high-level

features based on the dwell time (i.e., the time elapsed

between a key press and the same key release), on the

ﬂight time (i.e., the time elapsed between a key re-

lease and the next key press) and on the D2D-time

(down to down, i.e., the time elapsed between a key

press and the next key press):

• CPMilli: number of characters pressed in the se-

lected time window;

• Mode-dwell: mode of the dwell time of keys