Device Fingerprinting: Analysis of Chosen Fingerprinting Methods

Anna Kobusi

nska, Jerzy Brzezi

nski and Kamil Pawulczuk

Institute of Computing Science, Pozna

n University of Technology, Piotrowo 3, Pozna

n, Poland

Keywords:

IoT, Big Data, Fingerprinting, Web Tracking, Security.

Abstract:

Device ﬁngerprinting is a modern technique of using available information to distinguish devices. Finger-

printing can be used as a replacement for storing user identiﬁers in cookies or local storage. In this paper we

discover features and corresponding optimal implementations that may enrich and improve an open-source

ﬁngerprinting library Fingerprintjs2 that is daily consumed by hundreds of websites. As a result, the paper

provides a noticeable progress in the analysis of ﬁngerprinting solutions.

1 INTRODUCTION

Many on-line business models are based on the neces-

sity of distinguishing one web visitor from another.

Thus, web tracking becomes essential to the World

Wide Web. HTTP cookies (RFC, 2016),(Cahn et al.,

2016) are heavily consumed for this purpose. Once a

web page is requested, a cookie containing a unique

identiﬁer is stored on the users computer. Such prac-

tice is fundamental for many websites to ensure a high

level of usability. At the same time, it is exploited

by advertising companies to track user interests and

hence, increase the probability of purchase by serving

personalized offers. Yet, this mechanism has been re-

cently under high public attention. Due to the contin-

uous rise of privacy awareness in society, many peo-

ple tend to either block or regularly remove cookies

from their computers. Forthcoming laws and direc-

tives became a danger for future usage of this stor-

age type. For these reasons, many other alternatives

were considered. Various additional storage-based

techniques are daily utilized, thanks to the success-

ful adoption of HTML5 speciﬁcation (HTML5, 2016)

that introduced additional APIs e.g. localStorage or

indexedDB.

However, the past decade brought more advanced

invention, something that does not leave any data

on the user computer — e-ﬁngerprinting. And it is

even more powerful than human ﬁngerprinting. When

properly executed, the process may stay unnoticeable.

By collecting many small pieces of information about

the speciﬁc device, one can try to distinguish one from

another. Nowadays, it is very unlikely that, having a

set of random users, their devices, installed software

or its setting will not differ in any way. Large compe-

tition of hardware producers, daily software updates

caused by the need of addressing the latest security

threats, or high personalization trends are just a few

of the reasons for the devices to differ. That brings an

opportunity for ﬁngerprinting. Information such as

User-Agent header, screen resolution, hardware ﬁn-

gerprint (e.g. audio, canvas) or approximate location

based on IP address, once combined together, hold

invaluable identiﬁcation properties. Such data is eas-

ily obtainable from JavaScript. Once the user opens

a web page having a ﬁngerprinting script attached, a

user identiﬁer can be generated. Simple queries to

various APIs yield dozens of values which can be con-

sidered as ﬁngerprinting features. The simplest solu-

tion to get the ﬁnal user identiﬁer (out of the features

vector), is to apply a hash function to all of the infor-

mation concatenated into one string. If none of the

ﬁngerprints have changed over different visits of the

user, such hash is not going to differ between con-

secutive executions of the algorithm. Therefore, it

could be treated as an identiﬁer in the same way as

cookie identiﬁers. Depending on the type of used ﬁn-

gerprinting method, such identiﬁer should be called

a device, browser or user ﬁngerprint. Nevertheless,

device and browser terms are often considered equal

due to a small boundary laying in between.

Such ﬁngerprinting scripts are already in use. Fin-

gerprintjs2 (Fingerprints2, 2016) is an open-source

ﬁngerprinting solution which follows exactly the sce-

nario described above. It is used by many, primar-

ily with aim of blocking abusive users. Augur (Au-

gur, 2016) is a commercial solution providing de-

vice recognition based on a similar concept. Many

Kobusi

nska, A., Brzezi

nski, J. and Pawulczuk, K.

Device Fingerprinting: Analysis of Chosen Fingerprinting Methods.

DOI: 10.5220/0006375701670177

In Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security (IoTBDS 2017), pages 167-177

ISBN: 978-989-758-245-5

167

advertising-related companies have already incorpo-

rated basic ﬁngerprinting routines into their cookie

syncing scripts, which are the backbone of their busi-

nesses. All the examples are collecting ﬁngerprints

and generating ﬁnal identiﬁers on the client side. The

most advanced solutions will send the data to the

server which will do the job of putting all the infor-

mation together.

As the need for additional storageless techniques

appeared, various of ﬁngerprinting studies have

started. Most of them are focused on evaluating an-

other idea which could be turned into additional ﬁn-

gerprint. They usually discuss the issues related to

diversity and stability. These are the primary chal-

lenges each solution has to face. It is important to

collect as many independent ﬁngerprints as possible

so the samples are diverse enough to provide unique

device recognition. On the other hand, due to the con-

siderable speed of evolution of the software, hardware

and their settings, ﬁngerprints are changing equally,

on daily basis. Such changes have to be tracked down

and controlled by additional mechanisms, or unstable

features have to be classiﬁed and excluded from the

process.

Stability and diversity are the most important cri-

teria for all of the ﬁngerprint usages, yet many busi-

nesses are restricted with additional conditions which

this work focuses on. The length of execution code,

execution time and the length of the ﬁnal ﬁngerprint

are crucial limitations of any real-time ﬁngerprint-

ing solutions. So far, despite a noticeable need of

many companies that are trying to implement early

solutions, they were not addressed by other theoret-

ical studies. Thus, this study aims on implementing

various methods of ﬁngerprint collection and compar-

ing them accordingly to the most restrictive needs. In

the paper, a wide discussion of available ﬁngerprint-

ing methods was conducted. A set of most promising

ones was chosen for evaluation and has been imple-

mented within a ﬁngerprinting environment. Devel-

oped script has been executed on thousands of differ-

ent user browsers in order to collect real ﬁngerprinting

data. This data has been a subject of excessive anal-

ysis. As a result of cost-beneﬁt evaluation, a set of

features and respective optimal ﬁngerprinting imple-

mentations has been chosen.

The paper is organized as follows. Section 2 de-

scribes the topic background: explains web tracking

and available methods, introduces the term of ﬁnger-

printing, its usages and challenges. Section 3 dis-

cusses the literature of the topic. Section 4 presents

the architecture developed for the purpose of anal-

ysis of various ﬁngerprint features. Next, Section

5 presents the obtained results and their discussion,

while the last Section brings ﬁnal conclusions and

proposes the further steps.

2 DEVICE FINGERPRINTING

BACKGROUND

Web tracking is commonly known as assigning a

unique and possibly stable identiﬁer to each user vis-

iting a website. The general purpose is to connect

future page views of the same person or device with

historical ones. Most of all, it allows to serve person-

alized content and restore the visitors context. The

most common way of categorizing tracking is to di-

vide it regarding whether it uses any of the storage

mechanisms on the client side, i.e. storage-based and

storageless techniques.

2.1 Storage-based Techniques

A well known representative of this group are HTTP

cookies (Cahn et al., 2016). According to Web Tech-

nology Survey statistics (Persistent, 2016), they are

actively used on over 50% of websites globally. Half

of them are persistent, meaning they remain on a vis-

itors computer after closing the browser (until they

expire or until deleted manually). Their rising pop-

ularity, brought up to the public the topics of pri-

vacy in the web and dramatically raised the aware-

ness among people. Recent directives of the European

Union, known as Cookie law (Low, 2016), require

each website taking advantage of this mechanism to

openly notify it. Thus, HTTP cookies are being in-

creasingly deleted by privacy-conscious users. Ad-

ditionally, some browser maintainers are starting to

support this movement, e.g. Safari is blocking third-

party cookies by default to protect unwary customers.

All of that made cookies relatively unreliable. Fortu-

nately, there are many alternatives.

High attention is recently directed towards Web

Storage API, which was introduced in the newest

HTML speciﬁcation. It is already widely adopted

by browsers and offers similar to cookies method of

storing data, but for larger amounts. Usually, when

the user requests a cookie removal, this storage is not

cleared out, so the data still remains. Therefore, web

Storage is considered as modern cookies substitute for

storing user identiﬁers more persistently.

ETags are identiﬁers set by a web server to spe-

ciﬁc versions of resources found under URLs (Fet-

terly et al., 2003). Whenever a modiﬁcation of the

content occurs, a new tag is being assigned and sent

together with the requested ﬁle. By exploiting this

functionality aimed at cache validation, one can serve

IoTBDS 2017 - 2nd International Conference on Internet of Things, Big Data and Security

168

different ETags for each ﬁle request and thus, identify

users. Browser cache could be used similarly by serv-

ing ﬁles containing variable deﬁnitions of unique ids

— they shall be read on the client side and attached to

each further request. Local Shared Objects, known as

Flash cookies, are another place to store data, same as

Silverlights Isolated Storage, Internet Explorers user-

Data storage or HTML5 indexed database.

There are plenty of examples that could be ex-

ploited to serve as user identiﬁers storage, however

most of them are having poor browser support or their

reputation is infamous — knowing the history, reck-

less usage could end up with a law suit. A ﬁnal so-

lution for storage-based tracking is a JavaScript Ev-

ercookie (Kamkar, 2016). This script produces ex-

tremely persistent cookies in the browser, using all

possible methods at the same time. Whenever any of

the identiﬁers from a particular source is removed, it

is recreated using the remaining ones.

2.2 Storage-less Techniques

One category of methods which are not employing

any storage are state-based techniques, also known as

history stealing. Considered as attacks, they are rather

not visible across the web. CSS history knocking

exploits the browser feature of marking visited links

with different color (usually purple instead of blue).

With JavaScript, one can write into HTML DOM

some hyper-links and test their CSS properties to de-

termine whether the user has recently visited them.

This attack has its origins in the past decade. Over

time, browser maintainers were working to prevent

exploiting similar features — some queries for com-

puted hyper-link styles are being lied with false in-

formation about their appearance. Therefore, various

timing attacks were invented to detect when browsers

are trying to mislead. The battle between browsers

and attackers is still in place today, in the name of

users privacy.

Attribute-based and setting-based methods are

second half of storageless techniques. They are of-

ten referred to as ﬁngerprinting (device, browser or

user ﬁngerprinting) (Yen et al., 2012), (Acar et al.,

2013). Focusing on collecting as many small pieces

of information as possible and then putting them to-

gether is giving reasonably unique device identiﬁca-

tion. Various categories of ﬁngerprints could be de-

termined: low-level ﬁngerprinting: hardware (CPU

or GPU measuring) and network ﬁngerprinting (com-

paring TCP/ICMP/AJAX clock skew); information-

based ﬁngerprinting: collecting available information

e.g. User-Agent, JavaScript properties; behavioral

/ biometric ﬁngerprinting: measuring mouse move-

ment, typing, etc. On the other hand, ﬁngerprint-

ing could be divided into two categories according

to the execution mode: passive (collection of already

available data), active (measuring, tracking or active

querying in purpose of collecting additional informa-

tion).

While storage-based techniques are relatively easy

to be noticed, ﬁngerprinting is bringing the worst-

class scenario for user privacy. It has the insidious

property of not leaving any persistent evidence of de-

vice identiﬁcation process that has occurred. There-

fore, it has slightly wider applications. Some of the

most important (Webkit2016, 2016) are: identifying

users on devices previously used for fraud, establish-

ing a unique visitor count, advertising networks at-

tempting to establish a unique click-through count,

advertising networks attempting to proﬁle users to in-

crease ad relevance, proﬁling the behavior of unregis-

tered users, linking the visits of users when they are

both registered and unregistered and identify the user

when visiting the site without authenticating.

2.3 Fingerprinting Obstacles

A primary obstacle the ﬁngerprinting algorithm has to

deal with is stability. Over time, the users browser or

device is upgraded, which causes some ﬁngerprints

to change its value. Ideally, one should approach

this problem by tracking the changes in certain ways.

Once the browsers is updated, the User-Agent header

is upgraded to a higher browser version string. Some

of the installed add-ons are no longer supported and

therefore temporarily or permanently disabled. This

is one of the examples of ﬁngerprints evolution. Such

changes are mostly deterministic, so machine learn-

ing algorithms could make an effect in following them

(Yen et al., 2009), (Boda et al., 2011). Still, any ab-

normal user action, e.g. disabling cookies due to pri-

vacy awareness raised, installing a new font or change

of device location, would bring unpredictable shift

which is hard to deal with. Only if the adjustment

is not serious, it is likely to be still detected.

All the information about particular device col-

lected within ﬁngerprinting, needs to be as unique as

possible. There are many machines sharing the same

conﬁguration and having similar setting which ﬁnger-

print may be identical. Therefore, it is crucial to col-

lect many and diversiﬁed ﬁngerprints.

Measuring ﬁngerprints diversity can be done with

a mathematical tool — entropy. A distribution of a set

of ﬁngerprints is having 20 bits of entropy if randomly

picked value is only shared with one among each 2

devices. Entropy is deﬁned as follows:

H(X) = −

∑

i=1..n

P(x

) ∗ log

P(x

Device Fingerprinting: Analysis of Chosen Fingerprinting Methods

169

where X = (x

, x

, ..., x

) is a set of observed fea-

tures, where P(x

) describes discrete probability dis-

tribution. If a website is regularly visited by a set X of

different browsers with equal probability, the entropy

is going to reach its maximum and could be estimated

as H(X) ≈ log

|X|.

3 RELATED WORK

In 2010, EFF published a reference study (Eckersley.,

2010) on browser ﬁngerprinting. Relatively simple

script has been developed and used to collect over

470,000 samples, among which 18 bits of entropy

was observed. In total, 83.6% of unique users were

recognized. According to the study, ﬁngerprints were

changing quite rapidly (chance for a change of at least

one during primary 24 hours reached 37.4% while af-

ter 15 days raised to 80%), however it was relatively

easy to track. Using basic string similarity algorithm,

99.1% of modiﬁcations were tracked (false-positives

rate was 0.86%). Forged User-Agent header was not

enough to mislead the detection.

For a couple of years, Princeton University, coop-

erating with Catholic University of Leuven, has been

conducting relevant and valuable studies in the ﬁeld

of privacy on the web. Published in 2014 paper (Acar

et al., 2014), presenting the problem of canvas ﬁn-

gerprinting, cookie re-spawning and syncing, brought

serious media attention to these topics. Partially be-

cause of it, the score of 5.5% crawled sites exploiting

canvas ﬁngerprinting in 2014 dropped down to 1.6%

in 2016. Cookie syncing analysis showed, that only

around a quarter of third-party scripts is respecting

users not willing to be tracked (who have used either

opt-out cookies or set Do Not Track header). Created

for the purpose of conducting privacy studies on large

scale, OpenWPM web privacy measurement frame-

work is regularly used for analysis of over a million

top websites. According to recent results, tracking

is especially popular among websites serving news.

Scripts coming from particular companies that were

present on over 10% of analyzed sites were only from

the biggest players: Facebook, Google and Twitter.

Nevertheless, browser add-ons such as Ghostery or

uBlock Origin are dealing with those scripts quite ef-

fectively, except of very sophisticated and advanced

ones that are hard to classify (same for ﬁngerprinting

only around 60-70% of scripts is blocked). Canvas

ﬁngerprinting of fonts were observed on 0.3% of web-

sites while IP NAT address ﬁngerprinting with we-

bRTC API or audio ﬁngerprinting were present only

on about 0.06% of sites (Englehardt and Narayanan.,

2016).

There are also plenty of websites aimed at raising

awareness of tracking among Internet users. Many

on-line ﬁngerprinting tools (Frontier, 2016), (Cross-

browser, 2016), (Kurent, 2016), (Tillmann, 2016),

exposing various browser features, have been devel-

oped — collected ﬁngerprints are a subject of anal-

ysis for many similar studies. Moreover, some addi-

tional websites aimed at helping users to adjust their

browsers protection are present (BrowserSpy, 2016),

(Checklist, 2016).

4 EMPIRICAL EVALUATION

Analysis environment consists of three parts: ﬁnger-

printing script, back-end service and analysis tools.

To overcome the limitation of collecting ﬁngerprints

from a single dedicated web page, a script that can

be attached to any website was created (which in fact

is the target scenario of its usage). However, in-

stead of the machine that serves particular domain

to process the ﬁngerprint, it shall be sent to another

server that is responsible for data collection. Such

solution implies many technological issues that had

to be addressed. They are discussed in this section

altogether with a description of the setup. General

process of gathering ﬁngerprint samples is presented

in Figure 1. The script was exposed within Ama-

zon S3 Bucket and could be linked to any website.

When a user entered one of the collaborating web

pages, the script was downloaded and executed as

one of the assets. The outcome was sent directly

to the study server (Amazon EC2), which processed

the data, appended backend-side ﬁngerprints (HTTP

request headers) and eventually, stored it into Dy-

namoDB database for further analysis. The statistics

were generated with analysis tool that fetched the data

directly from Amazon.

Figure 1: Fingerprinting process scheme.

Fingerprinting script called bf.js has been devel-

oped. Once triggered, it collects all implemented ﬁn-

gerprints and sends them to the server, where they are

stored in the database.

IoTBDS 2017 - 2nd International Conference on Internet of Things, Big Data and Security

170

While creating ﬁngerprinting framework, commu-

nication with the server was the ﬁrst issue to be ad-

dressed. For security reasons, browsers restrict cross-

origin HTTP requests initiated by scripts. Yet, there

are certain exceptions that could be exploited. For

example, a request for an image containing the data

as GET parameter could be sent. Due to the character

limitation of URL parameters 1 , none of the solutions

are applicable up to the size of 100 KB — the average

size of a ﬁngerprint obtained within bf.js. Therefore,

CORS-enabled AJAX requests were used for transfer-

ring the data to the server. Within CORS, additional

preﬂight HTTP request (by speciﬁcation) is triggered

before the actual request is made. This was a sup-

plementary cost in performance that has to be kept in

mind while evaluating the overall ﬁngerprinting over-

head.

As the script was going to be most likely linked on

all of the sub-pages of the host website, each time the

user would navigate or refresh the page, the ﬁnger-

printing process would be started. To prevent that, a

cookie mechanism was implemented. Once the ﬁn-

gerprinting completed, it blocked its execution for

next 3 minutes. Such suspension allowed to track

long-term stability of ﬁngerprints and at the same

time, prevented ﬂooding of the database with identical

ones. This solution, as well as usage of WebStorage

API during ﬁngerprinting, brought the necessity to in-

form the users about usage of storage mechanisms, in

accordance to European Union cookie law.

Amazon Web Services were used as a back-end

infrastructure for the whole solution. Their ﬁrst and

foremost goal was to provide high-availability and

high-performance static ﬁles server for bf.js. As the

number of study participants was unpredictable and

any website could join the study at any time (by link-

ing the script), the machine should be provisioned for

high demand and easily scalable. Instead of creating

virtual machine running Apache, Nginx or another

type of server, Amazon dedicated solution for serv-

ing static ﬁles was utilized. S3 Bucket container is

a space for ﬁles which is a part of Amazon content

delivery infrastructure. It is used as assets server by

Amazon itself, the same way it was used within this

work.

Next element, constituted of EC2 service, pro-

vided endpoints for data collection. Created

t2.medium virtual machine instance was running

Amazon Linux RMI and Apache server. The latter

served as a proxy to core functionality. It handled

AJAX requests, initiated by bf.js, and through WSGI

module executed its processing implemented in the

Flask framework. Flask is a Python micro-framework

suitable for applications exposing small functional-

ity. Two endpoints were necessary to handle interac-

tions, one for GET requests and one for POST. The

ﬁrst was a debugging routine which could be used

to send exception message if such occurred on the

client side. The second was gathering the ﬁngerprints

transfered as JSON payload of POST requests. It was

also responsible for assigning unique cookie identi-

ﬁers (for the purpose of tracking ﬁngerprints stabil-

ity), extracting and appending HTTP request headers

to the dataset and ﬁnally, connecting to the database

instance to dump the data. DynamoDB, an Amazons

distributed NoSQL solution, ensuring performance

and high scalability, was used. Since the size of ﬁn-

gerprints (and therefore the requests) was substantial,

it was provisioned with 15 MB / s throughput. In case

it would not be enough for incoming trafﬁc, it could

be easily increased in a similar way the t2.medium

instance could be upgraded. Fortunately, during the

whole data collection period, there was no necessity

to update any of the conﬁguration.

In order to collect a reasonable number of ﬁn-

gerprints, bf.js had to be linked to a minimal num-

ber of websites such that combined together visitors

trafﬁc was analysis-considerable. A study page e-

ﬁngerprint.me had been created in order to ﬁnd sup-

porters. It provided all the essential information about

the work, simultaneously trying to persuade websites

administrators to get involved. Obviously, it was not

an easy task since foreign script execution may cause

a serious damage. Therefore, hosts that have taken

part in the study were mostly found as colleagues of

the author, except of those, who accepted the petition

with a privilege of attaching the script as a local re-

source (to protect from script modiﬁcation), after re-

viewing it. The data have been collected from 7 par-

ticipating websites during approximate period of one

month. In total 15042 records from 5038 users were

obtained.

5 RESULTS

The evaluation environment described in the previous

section allowed to obtain a reasonable number of sam-

ples for further analysis. In total, 15042 samples were

collected (of the total size 1.36 GB).

5.1 Evaluation Criteria and Data

Representation

Except of identiﬁcation of the best possible ﬁnger-

printing implementations of certain features, each at-

tribute has been analyzed according to the following

criteria:

Device Fingerprinting: Analysis of Chosen Fingerprinting Methods

171

Diversity — basic criterion for each ﬁngerprint-

ing study, a measure of how diverse is a set of samples

calculated independently for each attribute as entropy.

Additionally, a number of distinct and unique values

in the dataset was counted.

Stability — second cannon criterion states how

often a ﬁngerprint is changing its value over the time.

Four characteristics were calculated for each method:

total number of changes, average time distance be-

tween the changes, number of devices for which at

least one alternation was observed, average percent-

age ratio of how many samples have been modiﬁed

for these devices.

Length of Execution Code — as the number

of collected ﬁngerprints increases, as well as li-

braries necessary for processing, size of the execu-

tion code becomes a limitation for some real-time-

oriented businesses. Thus, length of miniﬁed code

for each method implemented in bf.js was included.

Advanced ﬁngerprints rely on time-consuming pro-

cessing that makes another limitation. Thus, execu-

tion time has been measured for each method inde-

pendently so the average time could be calculated.

Length of the Fingerprint — in scenario when

all the results are transfered to the server unchanged,

their overall size is a shortcoming. Average length of

sent data was computed as the last criterion.

Before the analysis, some essential data prepro-

cessing was executed. Out of 15042 samples two pro-

cessing sets were prepared:

• data unique — a set of unique samples used as

the base for all of the criteria evaluation, except

of stability. It was created by ﬁltering the sam-

ples by user cookie-based identiﬁers. For each

user, only the earliest observed sample was taken.

8350 entries were removed so 6692 samples pre-

served. Yet, some cookies could have been re-

moved in the meantime so their identical ﬁnger-

prints could be stored under many cookie–ids. An

important assumption has been taken — in such

a small dataset with large number of ﬁngerprint-

ing methods, it is very unlikely that many colli-

sions (two different devices having all of the ﬁn-

gerprints identical) could occurred. Hence, a sub-

sequent ﬁltering to remove identical ﬁngerprints

from the dataset was conducted. In total 1654 du-

plicates were dismissed, resulting in 5038 sam-

ples. Considerable number of recognized dupli-

cates conﬁrms that cookies are being frequently

removed by some users.

• data recurrent — a set of 8146 samples con-

structed by ﬁltering out all user entries from

which only a single record was collected. In other

words, the data for which stability over time could

be evaluated was preserved in this dataset.

This evaluation, having 5000 samples, could achieve

at best log

5000 ≈ 12.3bits.

5.2 Discussion

The number of possible features to be ﬁngerprinted

is immense. This work is focused on browser ﬁn-

gerprinting. Fingerprints have been divided into

two categories, based on the source of information:

JavaScript code executed within the client browser

or HTTP headers obtained on the server side. It is

important to note that browser ﬁngerprinting do not

have any explicit law interpretations. Some of the ﬁn-

gerprints are having questionable reputation and thus,

are denounced within speciﬁc societies. This study

does not focus on the legal issues. Any possible us-

age of poor reputation-wise ﬁngerprints was not in-

tended. All the collected samples were gathered for

educational purposes.

There are many properties exposed within

JavaScript APIs (e.g. window, navigator) bringing

valuable information. Most of the ﬁngerprinting so-

lutions available, are checking those values in true-

false dimension only. However, it is not correct ap-

proach since different browser versions may handle

them quite unexpectedly, for instance, returning false,

null or 0 as the negative value. Treating it all as false,

would be a rejection of precious data that is aimed to

be collected. Moreover, another additional piece of

information can be obtained by slightly more detailed

querying — by adding vendor preﬁxes. Some proper-

ties used to be preﬁxed with webkit, moz, ms or o re-

spectively for Chrome, Firefox, Internet Explorer and

Opera browsers, prior the ﬁnal standard was created.

Due to them, developers were able to control incon-

sistencies between the browsers. Preﬁxes for certain

properties are still working, even though they are of-

ten marked as deprecated. Such checks were included

in the evaluation.

Canvas Fingerprinting. Canvas is an HTML el-

ement used to draw basic 2D graphics on a web page.

Since this ﬁngerprint was very popular within past

years, many different ways of implementation were

discovered. In this paper 12 canvas ﬁngerprint tests

were collected to answer the question which proper-

ties are the most valuable. As a result, the following

conclusions were drawn:

• The canvas size (width and height) is having con-

siderable impact on the entropy. While all the

drawn elements are bigger, number of unique ﬁn-

gerprints is signiﬁcantly larger and the entropy in-

creases.

IoTBDS 2017 - 2nd International Conference on Internet of Things, Big Data and Security

172

• Tests for blending and winding support improved

the overall result.

• The smile icon rendering test achieved a surpris-

ingly high score of entropy. The most common

values in the dataset were following (some of

them seem to be identical while there are small

differences when compared binary)

• Surprisingly, the usage of fake (fallback) font has

lower entropy than the usage of widely- accessi-

ble Arial font, even though it registered a larger

number of uniques and distinct values.

• Adding a number to a text increased overall diver-

sity. As the test for special characters was not im-

plemented in a proper way (as extension instead

of method replacement), the result does not allow

to draw any particular conclusions.

The most advanced canvas test (canvas-advanced)

obtained 8.08 bits of entropy. It is a signiﬁcant score,

however other criteria must be considered. Appar-

ently, it is quite unstable (90 changes each 4.5 days),

time consuming (0.2s) and its length is the high-

est from all collected ﬁngerprints (21KB). Individual

tests imply that the ”smile” icon (canvas-fontSmiles)

is the primary source of instability and, at the same

time, of entropy. The bigger the canvas and drawn el-

ements are, the higher the entropy, instability and ex-

ecution time. The only stable element seem to be the

font drawing (canvas-basic, canvas-font*). Notwith-

standing, the average ﬁngerprint size of 21KB is too

large for most. Luckily, the usage of a hash function

can solve this issue if additional uniqueness deterio-

ration is acceptable.

Cookies and Web Storage API Support.

Browsers are exposing cookie support setting via nav-

igator.cookieEnabled property. Cookies, local and

session storage were tested both using JavaScript

properties (e.g. navigator.cookieEnabled indicating

the setting) and with active evaluation with the fol-

lowing scenario: get storage handle, write some data

into it, probe it for saved data existence, remove the

data. If the check for saved content failed or an excep-

tion was raised, storage mechanism could be consid-

ered as disabled. The results reveal that such method

was successful in detecting a few ”lied” situations for

local and session storage, while for cookies, property

value was always providing the same answer. Un-

fortunately, even though storage ﬁngerprints are sta-

ble and execution low-cost, their small entropy make

them relatively irrelevant. It it also worth noticing,

that only 2 distinct values were observed for cookies

test while larger studies collected up to 7 conﬁgura-

tions. It conﬁrms that small amount of collected data

does not allow to draw widely applicable conclusions.

CPU Class. This property is presumably present

only in Firefox and Internet Explorer (under oscpu

and cpuClass endpoint), while in Chrome it is a part

of appVersion. In 95% of cases navigator.cpuClass

did not return any value. 259 devices returned x86,

40 yielded ARM and x64 was observed twice, all re-

sulting in 0.25 bits of information. oscpu property

returned much more interesting results, the ratio of

empty values was 72%. Unexpectedly, it does not

only concern CPU architecture but also OS version,

making the entropy higher (1.76). Since both ﬁn-

gerprints were stable and their execution cost was

negligible, such consideration in independence makes

them a good choice for any algorithm.

Do Not Track (DNT) Header. Users are able to

set “Do Not Track ﬂag, indicating whether they wish

to not be tracked. Sadly, there is no public law to

respect this setting. IE 10 was released with DNT

header set to true by default — it brought a huge con-

troversy. From that time, all of the browsers are not

adding this ﬂag unless the user explicitly wishes oth-

erwise. This ﬁngerprint was collected in JavaScript

using two different objects: navigator and window.

The obtained results were exclusive and they did not

cover with the back-end side values. The fact that it

is not clear what is the real user setting does not pre-

vent these attributes from being useful in the ﬁnger-

printing process, due to relatively high entropies in

comparison to small numbers of distinct values (2 or

3). Paradoxically, a feature that was created to protect

privacy proved to be a valuable addition for this study.

Fonts Fingerprinting. The complete list of fonts

installed in the system can make another complex ﬁn-

gerprint. Browsers do not provide a way to retrieve

it without usage of external plug-ins (Adobe Flash of

Java), however there are hacks to obtain a partial col-

lection. Among two methods of ﬁngerprinting fonts,

canvas and CSS, the more efﬁcient one was intended

to be uncovered. In a very early stage of the sam-

ple collection, it was already clear that CSS-based

method is much more attractive than canvas prob-

ing. Because canvas tests were affecting overall pro-

cessing time substantially, they were entirely removed

from bf.js script. The comparison of the observations

of each method is the following:

• Average execution time of canvas-based font

probing was roughly three times slower.

• CSS detection slightly outranks canvas but in both

methods efﬁciency is almost complete (assessed

with manual veriﬁcation).

• CSS probing for foreign fonts containing excep-

tional characters (e.g. Japanese alphabet), even

though there were not included in the test string,

detected the font while canvas method did not.

Device Fingerprinting: Analysis of Chosen Fingerprinting Methods

173

The author suspects that CSS methods ”reserves”

the space (maximal height) for any character sup-

ported by a font, also if they are not printed.

• In some browsers discrepancies of 1 pixel were

observed. Therefore, the tests were improved to

meet this margin of error.

• Usage of a test string containing full alphabet

or the one chosen for fonts entropy assessment

(adfgjlmrsuvwwwwz7901) increased the detec-

tion rate in comparison to the string proposed in

other studies (based on m and w letters).

• Test string size of 70 pixels produced almost iden-

tical results as 180 or 200 pixels.

• monospace font was slightly more effective than

sans-serif, both for CSS and canvas tests.

• The only drawback of CSS method remains the

fact that it requires to be executed in users DOM

which brings a danger of inﬂuencing website ap-

pearance (canvas works in the background).

There were two additional observations which remain

unsolved. Firstly, for unknown reasons, drawing with

monospace as fallback font was on average 10 times

faster than drawing using sans-serif. The author did

not ﬁnd any conﬁrmed explanation for this fact. It

is suspected that monospace tests could have been

optimized after sans-serif checks were run, although

no particular execution order was assured. Secondly,

drawing strings of size 200 pixels were twice faster

than 70 pixels in CSS-based tests. The same possible

explanation applies.

Another important aspect of fonts evaluation is de-

termining a subset to be used for probing. A font

that is not supported for each user nor is present in

all the samples, will not allow to distinguish devices.

Maximum entropy (1 bit) is reached when a font is

present in exactly half of the data. Yet, choosing only

such fonts will not maximize the output since many

sets are strongly dependent. Therefore, an excessive

list of 821 fonts was prepared and for all of them, a

sample was collected. An iterative entropy maximiza-

tion algorithm was executed in order to ﬁnd optimal

collection. To achieve 6 bits result, in the best sce-

nario the following 9 fonts were used (ordered from

the most valuable): Open Sans, Brush Script MT, Es-

trangelo Edessa, Gadugi, Roman, Papyrus, MT Ex-

tra, Wingdings, Segoe UI Semibold. Above 8 bits,

the number of fonts required to improve the entropy

increases drastically. After reaching 9 bits the re-

maining 746 elements almost did not improved the

result. It shows how important choosing the right col-

lection is. It is essential not only for the diversity

but also for the code execution time (3.5s) and stabil-

ity (187 changes, 6 days), as this ﬁngerprint achieved

the worst results in both categories. Reducing the set

of fonts from 821 to 100 would decrease the aver-

age time necessary for probing to around 0.4s which

may be acceptable in certain usages. Stability metrics

should improve as well, although fontJs-sans-70px-

65 test probing for only 65 fonts still presents alarm-

ingly high instability (132 changes each 7 days). A

short investigation revealed three main categories of

changes that have occurred: (1) single font installa-

tion, (2) a large set of fonts changing the status from

absent to present, (3) single font ﬂuctuations. The

ﬁrst two categories may denote that the user has in-

stalled an additional font or a new software. Unfor-

tunately, there is nothing that can be done to prevent

them. Yet, often status changes of a particular font

are quite unlikely to be caused by a user action. Thus,

the latter category suggests either a ﬁeld for detection

algorithm improvement or necessity to investigate the

cause in a deeper manner.

Language Setting. Exposed by navigator ob-

ject language property, is supposed to return user pre-

ferred language, in a format described by RFC speci-

ﬁcation, e.g. en-US, pl-PL or de-Latin- CH 1992 [29].

4 methods of obtaining language were implemented.

Broadly supported (99.9%) navigator.language prop-

erty presented 2.1 bits of information. Remaining

tests returned a result in only 5% of cases and as their

values were mostly equal, they barely achieved any

entropy. Yet, thanks to a decent stability and low cost

execution all of the features are worth taking them

into consideration.

Platform Fingerprint.navigator.platform repre-

sents the platform on which the execution takes place.

The set of possible values is not closed and the repre-

sentation may differ from browser to browser. Ex-

ample values are: Linux aarch64, MacIntel, iPhone,

Nokia Series 40 or PlayStation 4. This ﬁngerprint has

changed its value only once, so it is one of the most

stable. 16 distinct values with 3 uniques were found

in the dataset (1.57 entropy).

Screen Properties. window.screen object may be

used to yield properties such as device screen color

depth, resolution and available resolution. The latter

is representing the space that may be consumed by

system applications (without menu bars). In terms of

ﬁngerprinting resolutions, depending on which value

is greater (width or height), the screen orientation is

additionally determined. Again, by using it, some ﬁn-

gerprinting solutions are incorrectly creating another

artiﬁcial ﬁngerprint. On the other hand, orientation

may be dangerous considering stability, as the users

may change it quite often. Among both screenCol-

orDepth and screenPixelRatio tests, stable but rather

similar values were collected, providing 0.74 and 0.82

IoTBDS 2017 - 2nd International Conference on Internet of Things, Big Data and Security

174

bits of entropy. However, screen dimensions method

yielded surprisingly diverse (5.76 bits) and unstable

results (90 changes, on average every 3 days). Insta-

bility was not expected since the test did not take into

account the screen orientation. It was analyzed what

entropy loss it implied — it was only 0.25 bits. Both

methods frequently yielded different values for the

same users, although window.screen.availHeight and

availWidth prevailed the ﬁnal result. Some changes

were marginal (e.g. 404 pixels to 401 pixels) and

their cause should be further investigated. Yet, many

changes appear to be a switch to entirely new reso-

lution of the same device or to an external display

(rarely since color depth and pixel ratio didnt change).

Timezone. Utilizing JavaScript Date object, one

can request an offset which shall represent user

system timezone setting within 15 minutes slots.

Browsers may yield here quite unexpected numbers

4 , which, properly interpreted, could make a valu-

able ﬁngerprint. Timezone ﬁngerprint results with 22

distinct and 7 unique values scored only 0.74 bits of

entropy. Yet, this ﬁngerprint is also very stable and

execution low-cost so worth a consideration.

Touch Support Detection. The evaluation of 6

detection methods suggests, that the three could be

used redundantly as they are all marked by the same

devices as touch-enabled (25% of the dataset, 0.75

entropy). touchSup-maxPoints test and the second

part of Modernizr library [34] check method returned

false for all of the devices. As Internet Explorer

property, msPointer marked additional devices as sup-

ported (0.24 bits), an ideal solution could make use of

a combination of these features.

WebGL Fingerprints. WebGL JavaScript API

allows to draw on three dimensional canvas in the

browser and used properly, makes another example

of hardware ﬁngerprinting. Images obtained with this

technology can be translated into text the same way as

for canvas ﬁngerprinting, and therefore easily com-

pared. Additionally, a variety of settings that may

extend the ﬁngerprint, can be accessed within getPa-

rameter and getShaderPrecisionFormat methods. Be-

sides collecting WebGL drawing ﬁngerprint, 10 cate-

gories of properties were collected. Their high en-

tropy makes them valuable, yet many samples have

changed over the time (on average after 36 hours). As

most of the tests manifested a similar performance,

they do not allow to draw any conclusions indepen-

dently. Additional evaluation was executed to asses

the attributes together. By combining drawing ﬁn-

gerprint with all properties, only 6.31 bits of entropy

were achieved. In total 73 values have changed within

a relatively short period of time, namely 27 hours.

As for the cost of 0.4 seconds of execution time, the

great length of code (6KB) and the ﬁnal sample size

of 5KB, this study does not allow to conclude that

WebGL features are a necessary addition to any ﬁn-

gerprinting algorithm.

5.3 Summary

A selection of the most efﬁcient features that could

make the client-side production ﬁngerprinting algo-

rithm is conducted. Additionally, some important ob-

servations useful in creating a more advanced solution

that utilizes a server-side logic (and HTTP-based ﬁn-

gerprints) are summarized.

Client-side Solution. Weighting the expectations

from an optimal ﬁngerprinting script, the following

key points were summed up to serve as the criteria of

the ﬁnal selection:

• The script should not ﬁngerprint any of the fea-

tures classiﬁed as unstable.

• As many features as possible should be employed

to ensure maximal diversity. Even if the ﬁnger-

print independent entropy is barely recognizable,

but all the other criteria are matched, such feature

should be included in the algorithm (the number

of samples collected within this study is not sig-

niﬁcant enough to come up with a conclusion of

permanent attribute rejection).

• Execution time of the script should not exceed

0.5s on average — many of the usages are aimed

on blocking abusive users which should be exe-

cuted as soon as they enter a website.

• A size of the ﬁnal code bundle should be mini-

mized to reduce the download time and save the

bandwidth on mobile devices.

A few of the implemented tests have been concluded

to need an improvement in order to match the crite-

ria. Thus, with the purpose of measuring the charac-

teristics of the algorithm created from an optimal set

of implementations, the dataset was translated into a

form of a results yielded by improved ﬁngerprinting

methods.

The only issue was a lack of the real world execu-

tion time data — an estimation had been made based

on the old methods performances. The result achieved

by all ﬁngerprinting methods together, implemented

in bf.js, were compared with the ﬁngerprinting efﬁ-

ciency of an algorithm utilizing only selected features.

Obtained with the ﬁrst solution entropy is extraor-

dinarily satisfactory, in fact almost ideal as for the

available dataset. Yet, bf.js could not be used in a

production environment since it was not built with

such intention — its execution time is exceedingly

high (3.9s) and instability (a change observed each

Device Fingerprinting: Analysis of Chosen Fingerprinting Methods

175

3.5 days) leaves much to be desired. Nonetheless,

the production solution, while matching all the ex-

pectations listed previously, achieved likewise high

diversity — only 0.3 less bits of entropy. The execu-

tion time of 0.4s is excellent, the number of changes

dropped by a half and the average time distance of a

change improved by almost 3 days, which is highly

more acceptable.

Server-based Solutions. 6 days of ﬁngerprint sta-

bility achieved with the proposed production solution

is far behind cookie-based identiﬁers that are able to

last for years. The need for more advanced techniques

is a natural way of improving the process of ﬁnger-

print creation. This work has employed certain as-

pects of a potential server-based solution, thus few

conclusions that could be useful in creating such were

summarized.

The primary obstacle is the transfer of data ob-

tained in the browser to the server. Length of cer-

tain ﬁngerprints (e.g. canvas, webGL) proved to be

unacceptable, thus the author suggests compressing

the data by applying a hashing algorithm before the

transfer. Locality preserving hash could be utilized

in case the server logic would implement a tracking

of value changes — it would allow to measure the

change extent. By having such hashes for the most

expensive ﬁngerprints and implementing translation

and compression methods for the remaining ones (e.g.

true/false setting sent as one bit of information, map-

ping of common phrases to shorter symbols), the ne-

cessity to use CORS POST request could be possi-

bly reduced. Because CORS introduces a noticeable

connection overhead, having a ﬁngerprint compressed

enough to ﬁt a GET parameter would signiﬁcantly ad-

vance the networking performance.

To improve the JavaScript code execution time, its

length and the size of transfered data, some ﬁnger-

prints could be processed on the back-end side instead

in the users browser, e.g. User-Agent accessible from

HTTP request headers holds identical information as

the value returned by JavaScript API — server could

utilize parsing libraries to extract meaningful data.

6 CONCLUSIONS

Fingerprinting, as a mechanism used in security and

advertisement, plays an important role in web track-

ing. This work proves that this storage-less technique

is really demanding and it requires a lot of effort to

develop an efﬁcient ﬁngerprinting algorithm. The re-

sulting solution presented satisfactory performance in

terms of diversity, execution time and the length of the

code bundle, yet demonstrated a need for improve-

ment of its stability, which is essential in most of the

usages.

Except for the beneﬁts coming from conducting

the ﬁrst evaluation of different ﬁngerprint implemen-

tations and producing an optimal set of features, this

work allows to draw many additional conclusions.

Analysis of existing solutions revealed some miscon-

ceptions that they introduce — creating artiﬁcial ﬁn-

gerprints like browser tempering is only exacerbat-

ing the overall efﬁciency. Some of the ﬁngerprints

(ad-block extension detection, ﬂash-based) have been

found to be unstable between regular browsing and

private-mode, something that should not make a dif-

ference to a respectable algorithm. An instability of

certain ﬁngerprints was observed and discussed alto-

gether with potential causes and possible improve-

ments. Finally, this work proves the superiority of

CSS-based font probing over canvas-based solutions

and allows to select a reference set of fonts providing

the best detection performance. Additionally, some

important objectives of an advanced server solution

were pointed out.

The outcome of this research provides a notice-

able progress in the analysis of ﬁngerprinting solu-

tions. The discovered features and corresponding op-

timal implementations will enrich and improve an

open-source ﬁngerprinting library Fingerprintjs2.

This study was not able to evaluate many addi-

tional features to be ﬁngerprinted, therefore an anal-

ysis of remaining ideas could take place. Certain

test outcomes did not allow to perform their full as-

sessment, thus continuation of their evaluation could

bring important ﬁndings in terms of their usability.

Importantly, a short period of data collection, result-

ing in a decent but limited dataset, did not allow to

conclude reliably in a few aspects — following re-

search should be conducted in the long-term to elimi-

nate such concerns. Device ﬁngerprinting proves to

be a powerful technique, yet leaving a large room

for improvement. Further researches have to be con-

ducted in order to decrease the efﬁciency distance

with well-known storage-based methods.

REFERENCES

Acar, G., Eubank, C., Englehardt, S., Juarez, M.,

Narayanan, A., and Diaz., C. (2014). The web never

forgets: Persistent tracking mechanisms in the wild.

technical report, princeton university, ku leuven.

Acar, G., Juarez, M., Nikiforakis, N., Diaz, C., G

urses,

S., Piessens, F., and Preneel, B. (2013). Fpdetective:

dusting the web for ﬁngerprinters. In Proceedings of

the 2013 ACM SIGSAC conference on Computer &

communications security, pages 1129–1140. ACM.

IoTBDS 2017 - 2nd International Conference on Internet of Things, Big Data and Security

176

Augur (2016). Augur, a set of apis and tools that instantly

enables businesses to recognize devices, and con-

sumers across devices. [on-line] https://www.augur.io/

(retrieved: 08/2016).

Boda, K., F

oldes,

A. M., Guly

as, G. G., and Imre, S. (2011).

User tracking on the web via cross-browser ﬁnger-

printing. In Nordic Conference on Secure IT Systems,

pages 31–46. Springer.

BrowserSpy (2016). Browserspy on-line ngerprinting

test tool. [on-line] http://browserspy.dk/ (retrieved:

08/2016).

Cahn, A., Alfeld, S., Barford, P., and Muthukrishnan, S.

(2016). An empirical study of web cookies. In

Proceedings of the 25th International Conference on

World Wide Web, WWW ’16, pages 891–901.

Checklist, S. (2016). Web browser security checklist.

[on-line] https://www.browserleaks.com/ (retrieved:

08/2016).

Cross-browser (2016). Cross-browser ngerprinting test 2.0.

[on-line] https://ﬁngerprint.pet-portal.eu/ (retrieved:

08/2016).

Eckersley., P. (2010). How unique is your web browser? in

international symposium on privacy enhancing tech-

nologies symposium, pages 118. springer, 2010.

Englehardt, S. and Narayanan., A. (2016). On-line tracking:

A 1-million-site measurement and analysis. technical

report, princeton university.

Fetterly, D., Manasse, M., Najork, M., and Wiener, J.

(2003). A large-scale study of the evolution of web

pages. In Proceedings of the 12th International Con-

ference on World Wide Web, WWW ’03, pages 669–

678. ACM.

Fingerprints2 (2016). Fingerprintjs2 - mod-

ern browser ngerprinting library. [on-line]

https://github.com/valve/ﬁngerprintjs2.

Frontier, E. (2016). On-line ngerprinting test conducted

by electronic frontier foundation. [on-line] https:

//panopticlick.eff.org/ (retrieved: 08/2016).

HTML5 (2016). HTML5, a vocabulary and associated

apis for html and xhtml. http://aiweb.techfak. uni-

bielefeld.de/content/bworld-robot-control-software/

adsfdf afdfds afsddfs adfd adfdf adfsdfs adfsdsf

afsddfs. [on-line] https://www.w3.org/tr/html5/

(retrieved: 08/2016).

Kamkar, S. (2016). Evercookie virtually irrevocable persis-

tent cookies. [on-line] http://samy.pl/evercookie/ (re-

trieved: 08/2016).

Kurent, A. (2016). Crossbrowser device

ngerprinting diploma thesis. [on-line]

http://ﬁngerprinting.comyr.com/ (retrieved: 08/2016).

Low, C. (2016). Cookie law explained. [on-line]

https://www.cookielaw.org/the-cookie-law/ (re-

trieved:08/2016).

Persistent (2016). Usage of persistent cookies for websites.

[on-line] https://w3techs.com/technologies/details/ce-

persistentcookies/all/all (retrieved: 08/2016).

RFC (2016). RFC 6265 specication. http

state management mechanism. [on-line]

https://tools.ietf.org/html/rfc6265 (Retrieved:

08/2016).

Tillmann, H. (2016). Browser ngerprinting test by henning

tillmann. [on-line] http://bfp.henning-tillmann.de/ (re-

trieved: 08/2016).

Webkit2016 (2016). Fingerprinting in webkit. [on-line]

https://trac.webkit.org/wiki/ﬁngerprinting.

Yen, T.-F., Huang, X., Monrose, F., and Reiter, M. K.

(2009). Browser ﬁngerprinting from coarse trafﬁc

summaries: Techniques and implications. In Inter-

national Conference on Detection of Intrusions and

Malware, and Vulnerability Assessment, pages 157–

175. Springer.

Yen, T.-F., Xie, Y., Yu, F., Yu, R. P., and Abadi, M. (2012).

Host ﬁngerprinting and tracking on the web: Privacy

and security implications. In NDSS.

Device Fingerprinting: Analysis of Chosen Fingerprinting Methods

177