Identity Security in Biometric Systems based on Keystroking

∗

Lucjan Hanzlik and Wojciech Wodo

Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wroclaw, Poland

Keywords:

Identity, Security, Keystroking, Biometry, Keylogger, Impersonation.

Abstract:

The most valuable element of biometric security systems are the personal features of its users. Characteristics

of individuals are unique and must be protected. We focus in this paper on methods of protection of user

identity in systems based on keystroking. Our approach assumes giving minimal information to adversaries

and the best responsiveness of the system regardless of user representation or possible usage. We consider

keystroking not only in the context of keyboard, but also touch screen, pin pad and any other input device

that could be used for typing. We present as results several complete security solutions that are applicable for

software as well as hardware systems.

1 INTRODUCTION

Biometric security systems have been used for many

years. They are based on physical and behavioral

characteristics inherent in the body. Keystroke dy-

namics is a behavioral biometric feature deﬁned by

our way of typing. Among its advantages, it does not

require an additional reader or scanner because every

computer is equipped with some kind of keyboard.

Another advantage is that keystroke dynamics can be

used for continuous authentication of the user; that is,

as long as the user is at the workstation, the system

can continuously verify his identity. In numerous pa-

pers authors consider the effectiveness of authoriza-

tion or identiﬁcation systems (Monrose and Rubin,

1997), the decreasing of False Acceptance Rate or

False Rejection Rate, more accurate models of users

(Bergadano et al., 2002) and other issues connected

with positive usage of system. In this paper we would

like to focus on protecting the keystroke identity of

individuals. Once stolen, biometric data can be used

indeﬁnitely, thus it is important to secure this infor-

mation.

Structure and Contribution of Paper. In Section 2

we describe brieﬂy how keystroking systems work,

what kind of data we collect and how we measure

these data using toolkits that we developed for in-

tercepting raw data from the keyboard—keyloggers.

We also identify security threats related to the steal-

∗

This research was supported by European Union POKL

4.1.1 Mloda Kadra and FNP Ventures/2012-9/4 scientiﬁc

projects

ing of user identity and impersonation. Section 3 is

devoted to issues connected with protecting an in-

dividual’s identity. There we present our protection

algorithms with implementation details and consider

many use cases. A hardware approach to the problem

of identity protection is discussed in Section 4. There

we also develop our microprocessor device for USB

keyboards and consider protection of Smart Card ter-

minals/readers. Analysis of the algorithms’ effective-

ness is described in Section 5. The ﬁnal Section 6

contains our conclusion and recommendation for fu-

ture work.

Related Work. There are many papers and im-

plementations based on keystroke dynamics. Au-

thors consider mostly classiﬁcation algorithms (Za-

hid et al., 2009) or extraction features from inputted

text (Zhong et al., 2012). Discussion about discrimi-

nation of digraphs and trigraphs for free text can be

found in (Sim and Janakiraman, 2007). An inter-

esting approach to identity veriﬁcation based on the

analysis of the typing rhythms of individuals on dif-

ferent texts is presented in (Bergadano et al., 2003),

but to our knowledge the ﬁrst attempt to implement

software protection from a bot attacker is in (Stefan

et al., 2012). Work on the security of user identity has

been carried out in part by (Klonowski et al., 2012)

and (Wodo, 2012).

2 KEYSTROKING SYSTEM

Every keystroking system ﬁrst collects data from the

524

Hanzlik L. and Wodo W..

Identity Security in Biometric Systems based on Keystroking.

DOI: 10.5220/0004526405240530

In Proceedings of the 10th International Conference on Security and Cryptography (SECRYPT-2013), pages 524-530

ISBN: 978-989-8565-73-0

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

user and checks its compliance during authentication

or identiﬁcation. The process of gathering data could

be performed once (e.g. the user may type his login

and password a dozen times or a free text sample of

demanded length) as well as every time the individual

uses the system. Such a solution can be used for con-

tinuous authentication to assure that only legitimate

users have access to a certain resource, e.g. worksta-

tion. Inserted data is processed as digraphs (two con-

secutive keys) or longer n-graphs depending on the

approach (considering context or not).

2.1 Data Collection

In the case of keystroking dynamics the user infor-

mation can be derived from timings coming from a

keyboard or other input device, e.g., touch screen or

pin pad. It is essential to gain access to raw data in

real-time, because we operate on time measured in

milliseconds. Any disruption or additional signal pro-

cessing can alter timings. We focused on gathering

the keyboard events keyUp and keyDown consisting

of the key pressed or released and the timestamp of

each event. This allows us to calculate dwell time and

ﬂight time timings for each digraph or even each n-

graph.

One way to collect such data is to install a hard-

ware keylogger on USB/PS2 port and communi-

cate with it by simulating keyboard events. An-

other approach—ourchoice—is to intercept keyboard

events in software before the operating system pro-

cesses them. This solution is highly dependent on the

OS; one can, for example, write a new function for the

keyboard interrupt, install a daemon/service or hook

into the kernel.

We developed software toolkits to facilitate

the process of data collection for Windows XP/7

and the Linux operating systems. The im-

plementation for Windows XP uses WinAPI’s

SetWindowsHookEx()

function for connect global

hook,

WH_JOURNALRECORD

to queue of messages

and the

GetSystemTime()

function to deal with

timestamps. For Windows 7 we simply used the

GetAsyncKeyState()

method to get all keyboard

events. For the Linux operating system we captured

raw keyboard data from the event device node for ex-

ample

/dev/input/event1

2.2 User’s Representation

Generally, our considerations are independent of the

adopted model user’s representation because we work

on raw data that is coming directly from an input de-

vice. However, in the case where we want to imper-

sonate a user, we have to take into account way of

storing and transforming keystroke data (3.4).

For each digraph we construct a histogram of tim-

ings (divided in 1 ms slots for dwell- and ﬂight- time).

We reduce outliers and interpolate the gaps among the

most common values in order to avoid sequences of

the identical timings (it can happen in case of too lit-

tle data). In the end we normalize these histograms

and use them as a probability distribution of particu-

lar timings.

2.3 Security Threat

There are risks associated with the use of keystroke

dynamics in authorization systems. A user’s

keystroke identity is permanent like other biometric

features. It is not an easy task for a person to change

his typing rhythm. Thus, once an adversary captures

a user’s identity he may use it multiple times to gain

access to the system. This is a security threat for

systems based on authorization by keystroke dynam-

ics since such systems provide no additional secu-

rity. Other security measures must be applied to make

those systems secure.

Another security threat concerns the privacy of

users. As shown in (Klonowski et al., 2012) the

keystroke identity may be used to uniquely identify

a user. An adversary may capture the data ﬂowing

to a system and create proﬁles of its users. Since the

adversary is able to create such proﬁles he may dis-

tinguish between users and break the anonymity of

systems that are designed to preserve it. Note that

this may be also a privacy threat concerning elec-

tronic IDs. An adversary may capture the pin pad

input while the user is typing his PIN. Thus even if

communication and authorization protocols for eIDs

preserve privacy, such side channel attacks may leak

information about users activities.

3 IDENTITY PROTECTION

In this section we would like to focus on protecting

the biometric data of legitimate users, which could

be intercept during daily tasks. As mentioned in 2.3,

once stolen, biometric data can be used indeﬁnitely

to impersonate a user because it is mutually corre-

lated with the individual. Our aim is to change the

user’s original timings and therefore provide no ad-

ditional information to the attacker about the user’s

typing rhythm or even that any protection system is in

place. We want the adversary to think he is gaining

valuable information, but in fact is collecting useless

data. The longer this situation persists the better for

IdentitySecurityinBiometricSystemsbasedonKeystroking

525

the user because if the attacker is unaware of the pres-

ence of security software, he will not try to disable

it.

All presented solutions are based on a buffer and

delay algorithm, and therefore require access to time

indicators. Keyboard events (key down and key up)

are intercepted before reaching the application level

of the operating system and are stored in memory.

A timestamp of each event is recorded, and after

an appropriate delay the event is released, similar to

(Klonowski et al., 2012).

In our research we have considered two types of

timings, i.e., ﬂight time and dwell time, in order to

provide the best possible security.

3.1 Constant Delay Algorithm

This kind of algorithm seems to be the easiest to

implement. It works by simply gathering keyboard

events in a buffer (queue) and releasing them one by

one after d-time period passes. Parameter d regulates

the delay and obfuscation threshold. This algorithm

works perfectly for timings under threshold d gener-

ating uniform sequences of timings for typed data.

Above this threshold it works in immediate mode

(that is, it does not delay timings) to keep the sys-

tem responsive and does not protect the user. Such an

approach provides a very high level of security for at

least moderate-skilled typists. It is unfortunately use-

less for people who cannot maintain a typing rhythm

at the minimum level of the set threshold.

According to both our research and (Song et al.,

2001) the vast majority of ﬂight times are below 400–

500 ms. The average delay for ﬂight time is about 200

ms. For that reason we recommend setting d between

200–250 ms. Such a setting allows one to obfuscate

almost every timing. Recall that a delay is generated

after every event, i.e. key down and key up. So in

case of typing digraph th, we would have the follow-

ing sequence of events: keyDown(t), keyUp(t), key-

Down(h), keyUp(h). Events could be generated in

another order because the user can hold a key for a

longer time and press another one, therefore we have

to consider this scenario as well. Practically, this so-

lution gives us about 400–500 ms time for typing an-

other character of input text providing protection of

our identity. If there is a well trained typist or a be-

ginner, parameter d could be align to avoid losing ﬂu-

ency of work for the former and protection of the user

in the latter. We implement these considerations in

Algorithm 1.

Algorithm 1: CONSTANT TIME DELAY.

Input: d: 10-bit integer (delay in [ms])

1 begin

2 Buf fer : Quene ;

3 while wait for signal T do

4 T → INPUT ;

5 if Buf fer.Empty() then

6 T → OUTPUT ;

7 Set Timer(d) ;

8 else

9 Buf fer.Enqueue(T) ;

10 ;

11 proc Timer Timeout ;

12 begin

13 Buf fer.Dequeue() → OUTPUT ;

14 if !Buf fer.Empty() then

15 Set Timer(d) ;

3.2 Random Noise Algorithm

It is obvious that the solution provided by Algorithm

1 is easily detected by the adversary. In order do en-

hance stealth mode of our protection system we con-

sider some modiﬁcation of the previously described

algorithm. First of all, we decide for each keyboard

event whether it will be delayed or not (with probabil-

ity 0.5). Next, we generate a set of timing extensions

between 15 and 70 ms. Such an approach allows us

to create an impression of typing by a real user. This

method is applicable only for well-trained typists, be-

cause the random noise algorithm will only slightly

affect a user’s timings, and therefore cannot provide a

long delay. In fact, this is not real random noise ap-

proach because we do not have the option to shorten

timings. This is because we modify delays on the ﬂy,

and so this algorithm is more like the moving thresh-

old method. Algorithm 2 includes the above assump-

tions.

3.3 PUF Method

The physical unclonable function (PUF) is a function

that is easy to evaluate but hard to predict. In addi-

tion, it is easy to create a PUF device but impossible

(in practice) to make a copy of it. We show how to use

those features to create Algorithm 3 which changes

the keystroke identity of a user in a static and unclon-

able way. The idea of this solution is that a user may

create two or more keystroke identities (using PUF

based devices) and switch between them (using, for

example, passwords or private logins) whenever he

SECRYPT2013-InternationalConferenceonSecurityandCryptography

526

Algorithm 2: RANDOM NOISE DELAY.

1 begin

2 Buf fer : Quene ;

3 while wait for signal T do

4 T ← INPUT ;

5 if Buf fer.Empty() then

6 T → OUTPUT ;

7 Timer Evoke();

8 else

9 Buf fer.Enqueue(T) ;

10 ;

11 proc Timer

Evoke ;

12 begin

13 if Random(0,1) ≥ 0.5 then

14 d ← PickAtRandom(15, 70);

15 else

16 d ← 0;

17 Set Timer(d) ;

18 ;

19 proc Timer Timeout ;

20 begin

21 Buf fer.Dequeue() → OUTPUT ;

22 if !Buf fer.Empty() then

23 Timer Evoke() ;

likes. Even if an adversary has access to the device he

is unable to clone it.

Let φ : {0, 1}

h+7

→ {0, 1}

h+7

be a PUF function

and let H : {0, 1}

→ {0, 1}

be a hash function, e.g.

SHA1 with h = 160. Algorithm 3 takes as input a

password pass and computes the real time gap be-

tween two consecutive keyboard events (stored in a

buffer) and changes this time using the password and

the PUF function. Due to space reasons we only show

the function used to transform the timing between the

two consecutive keyboard events.

Note that we do not change the three most signiﬁ-

cant bits of the timings. Thus, the

Trans_Time

proce-

dure computes a value which differs from the original

one by at most 128 ms. However, such small differ-

ences change the keytroke characteristics of a user

Algorithm 3: PUF BASED TRANSFORMATION.

Input: pass : String, time: 10-bit integer

1 begin

2 proc Trans Time ;

3 begin

4 pass

← H(pass) ;

5 res ← φ(pass

||time

. . . time

);

6 return time

time

res

. . . res

and make it difﬁcult for the adversary to determine

whether the user uses protection against attacks. No-

tice further that if the adversary has access to the PUF

based device of a user he may send arbitrary chal-

lenges to the φ function and store the results. How-

ever, because the passwords for each identity are se-

cret, the adversary would have to guess the pass

value for the user’s identities or challenge all possi-

ble values, which is computationally infeasible for a

large h. Thus, unless the user’s passwords are known

an adversary is not able to clone the user’s device and

to link the user’s identities.

3.4 Pretending other User Algorithm

This solution cannot be implemented in real time,

because it requires a sequence of characters in

order to calculate new timings. The algorithm

works as follows: ﬁrst, it buffers keystrokes until

synchronization—signaled by a break in typing (≥

500 ms). Next, new timings are calculated on the

basis of data previously gathered from other users

(according to 2.2). Finally, the algorithm ﬂushes

the buffered keystrokes with the modiﬁed delays. In

essence we replace the original timing of a particular

digraph with those taken from the distribution deﬁned

for another user.

We recommend this solution in cases where real-

time communication is not critical and we can afford

to capture text ﬁrst and later inject it (e.g. login form).

An adversary will have serious problems identifying

this security method because the algorithm pretends

to be another individual. Generally this solution pro-

vides high security and its quality depends only on the

user’s data gathered previously.

3.5 Binary Representation Probability

Algorithm

We use the idea of Algorithm 3, i.e. changing the gap

between two consecutive keyboard events. However,

in Algorithm 4 we use a probabilistic change. The

algorithm takes as input a parameter p ∈ (0, 1) and

ﬂips the i-th bit of the gap with probability p

i+1

Note that while the signiﬁcance of the bits raises

the probability of a change falls.

4 HARDWARE DEVICE

In order to provide protection of a user’s biomet-

ric identity we could implement our security algo-

rithms in external hardware devices (similar to hard-

ware keylogger) or include them in internal micro-

IdentitySecurityinBiometricSystemsbasedonKeystroking

527

Algorithm 4: BINARY REPRESENTATION PROBA-

BILITY.

Input: p: double

1 proc Trans

Time ;

2 begin

3 res ← time;

4 for i ∈ 0, . . . , 9 do

5 r ← Rand(0, 1);

6 if r < p

i+1

then

7 res

← res

⊕ 1

8 return res

. . . res

processor systems. Such an approach guarantees very

high security, because an adversary must have physi-

cal access to the device in order to disable the protec-

tion or gain raw data. This solution eliminates any

kind of software attack as well, because the appli-

cation layer gets already modiﬁed data. Command

communication between the security unit and the user

can be achieved by simulating input device events

(e.g. keyboard keystrokes). This special service mode

could be invoked by inserting a code sequence, com-

pare (Wodo, 2012).

4.1 Keyboard Security Unit

We propose a simple microprocessor device to pro-

tect the keyboard. It works as a man in the mid-

dle unit—plugged in between the workstation and the

keyboard. This solution could be implemented in a

form of USB/PS2 connector, which transforms the

signal and adds appropriate delays.Each of the algo-

rithms presented earlier may be adopted for this kind

of device (depending, of course, on the capabilities of

the selected microprocessor).

In our opinion such a solution providesthe highest

level of security because it completely isolates anyad-

versary from any inﬂuence on the protection system.

In addition, it is much quicker than software applica-

tion and does not use computer resources. The draw-

back of this method is that it can be applied only to

the externally connected input devices.

4.2 Smart Card Security

We stated earlier that the anonymity of eID users may

be threatened by an adversary that captures the pin

pad input. Note that for protection we must use an off-

card solution since the user types his pin on the ter-

minal/reader side. The adversary may be the terminal

owner or producer so any countermeasures applied in-

side the terminal may also be unsafe. One possible

solution is for users to use their own pin pad like de-

vices which simulate the input of the user on the real

pad. This approach has some disadvantages. For ex-

ample, users would have to carry additional baggage,

and there is also the issue of powering such a device.

Another possibility is to use an additional RFiD based

Smart Card. A user would have to ﬁrst insert his eID

into the terminal/reader and then use the RFiD card

(with a unique number stored in it) to authorize trans-

action instead of typing his PIN. Note this solution

is much cheaper and less complex that the ﬁrst one.

However, is some cases it is desirable to demand that

a user types the PIN and that the transaction is not

performed by a computer (which might be the case if

we use the second solution).

5 EFFICENCY ANALYSIS

In this section we present and discuss the results of

our tests. We captured and recorded sample datausing

our keylogger-like program (see 2.1) (about 14000

key events). The timings between corresponding key-

board events (i.e. key up, key down) are presented in

Figure 1. Note that the data was truncated to 300 ms

due to the fact that, for our test user, a higher timing

corresponds to a pause in typing.

Figure 1: Unmodiﬁed Timings.

Then, we transformed the raw data according to

algorithms presented in Section 3. Note that the best

security level gives the constant time delay algorithm.

It can be seen in Figure 2 that events occur with con-

stant delays. Thus, the adversary receives no informa-

tion about the typing rhythm of the user. However, he

is aware that a protection is used by the individual.

As we can observe from Figure 2, random noise

delay algorithm obfuscates with certain probability

(p) timings by adding random delay at the level of

dozens of milliseconds. An adversary cannot recog-

SECRYPT2013-InternationalConferenceonSecurityandCryptography

528

nize which timings were delayed, which results in a

high security level. Moreover, thanks to changed tim-

ings distribution, which is similar to those of a real

user, it is much harder to determine that any protec-

tion is used in that case. Admittedly this method af-

fects signiﬁcantly short timings so it is more appropri-

ate for well-trained typist. Manipulating p results in

shifting the occurrences of real timings to higher val-

ues and decreasing the cardinality of short timings.

The PUF based algorithm, as can be seen from

Figure 2, changes the occurrence of keyboard events

in a signiﬁcant but deterministic way. Note that a dif-

ferent passwords would give different results. It is

obvious that such characteristics give the adversary

no substantial information about the user or whether

he is using any protection. However, a context analy-

sis of the data discloses that protection is used. Note

that it may happen that for a given PUF function two

occurrences of a digraph (e.g. ok) differ by about 1–

5 ms in the original message, while after the trans-

formation the difference can be much higher. Thus,

this solution may signiﬁcantly change the variation of

some digraphs, making it easy for an adversary to de-

tect this kind of protection.

Note that the binary representation probability al-

gorithm ﬂattens the histogram. As can be observed

from Figure 2 (probability p = 0.5) with the increase

of the probability p the timing distribution becomes

more similar to uniform distribution. Obviously, the

adversary is able to detect that the user is using pro-

tection if we use a higher probability p. However, it

should be harder for the adversary to gain any infor-

mation about the identity of the user as the histogram

becomes ﬂatter.

Figure 2: Modiﬁed Timings.

6 CONCLUSIONS

In this paper we consider security algorithms work-

ing mostly in real time and what makes them difﬁcult

to implement in real environments (responsiveness of

the system). If we could record the whole stream of

data and replay it with modiﬁed timings we would be

able to create any sequences of keyboard events we

want.

The presented solutions provide, in our opinion, a

high security level, but could effect using keystroking

as a method of verifying identity. In that case it is a

tradeoff between usability and security. For instance

the constant-time algorithm destroys the possibility of

distinguishing users, on the other hand the PUF based

algorithm transforms only one user identity into an-

other. We consider protection of identity regardless

of the user’s representation model which means that

the quality of intercepted information depends on as-

sumptions of adversary (e.g. is any protection used,

what kind of algorithm is used).

We highly recommend using a hardware-based so-

lution, if possible, because it has the following advan-

tages: (1) It cannot be disabled remotely; and (2) It

can perform more effective calculations without bur-

dening the CPU of the workstation.

REFERENCES

Bergadano, F., Gunetti, D., and Picardi, C. (2002). User au-

thentication through keystroke dynamics. ACM Trans.

Inf. Syst. Secur., 5(4):367–397.

Bergadano, F., Gunetti, D., and Picardi, C. (2003). Iden-

tity veriﬁcation through dynamic keystroke analysis.

Intell. Data Anal., 7(5):469–496.

Klonowski, M., Syga, P., and Wodo, W. (2012). Some re-

marks on keystroke dynamics - global surveillance, re-

trieving information and simple countermeasures. In

SECRYPT, pages 296–301.

Monrose, F. and Rubin, A. D. (1997). Authentication via

keystroke dynamics. In ACM Conference on Com-

puter and Communications Security, pages 48–56.

Sim, T. and Janakiraman, R. (2007). Are digraphs good for

free-text keystroke dynamics? In CVPR.

Song, D. X., Wagner, D., and Tian, X. (2001). Timing anal-

ysis of keystrokes and timing attacks on ssh. In Pro-

ceedings of the 10th conference on USENIX Security

Symposium - Volume 10, pages 25–25, Berkeley, CA,

USA. USENIX Association.

Stefan, D., Shu, X., and Yao, D. D. (2012). Robustness

of keystroke-dynamics based biometrics against syn-

thetic forgeries. Computers & Security, 31(1):109–

121.

Wodo, W. (2012). Kradziez tozsamosci i podszywanie

sie pod innych uzytkownikow w systemach biom-

etrycznych opartych o keystroking. In Interdyscy-

IdentitySecurityinBiometricSystemsbasedonKeystroking

529

plinarnosc badan naukowych 2012, pages 461–466,

Wroclaw, Poland. Oﬁcyna Wydawnicza Politechniki

Wroclawskiej.

Zahid, S., Shahzad, M., Khayam, S. A., and Farooq, M.

(2009). Keystroke-based user identiﬁcation on smart

phones. In Proceedings of the 12th International

Symposium on Recent Advances in Intrusion Detec-

tion, RAID ’09, pages 224–243, Berlin, Heidelberg.

Springer-Verlag.

Zhong, Y., Deng, Y., and Jain, A. K. (2012). Keystroke dy-

namics for user authentication. In CVPR Workshops,

pages 117–123.

SECRYPT2013-InternationalConferenceonSecurityandCryptography

530