Experimental Evaluation of Password Recovery in Encrypted Documents

Radek Hranick

y, Petr Matou

sek, Ond

rej Ry

sav

y and Vladim

ır Vesel

Faculty of Information Technology, Brno University of Technology, Bo

zet

echova 2, 61266, Brno, Czech Republic

Keywords:

Password Recovery, GPU Acceleration, Privacy, Digital Forensics, Encrypted Documents.

Abstract:

Many document formats and archiving tools (PDF, DOC, ZIP) support encryption to protect the privacy of

sensitive contents of the documents. The encryption is based on standard cryptographic algorithms as AES,

SHA, and RC4. For forensic purposes, investigators are often challenged to analyze these encrypted docu-

ments. The task of password recovery can be solved using exhaustive state space search using dictionaries

or password generators augmented with heuristic rules to speed up recovery. In our experimental study, we

focus on the password recovery of the common document and archiving formats using parallel computation on

conventional hardware with multi-core CPUs or accelerated by GPU processors. We show how recovery time

can be estimated based on the alphabet, maximal password length and the performance of a given hardware.

Our results are demonstrated on Wrathion, a tool developed by our research team.

1 INTRODUCTION

File protection is often realized by encrypting the ﬁle

contents protected by a user-created password. The

legitimate use of ﬁle protection is to avoid unautho-

rized access and misuse of the ﬁle contents. If the

document encryption is a traditional method of pro-

tecting and sharing documents with sensitive informa-

tion, contradicting requirements appear: passwords

have to be strong enough in terms of length and a set

of chosen characters. On the other hand, it should be

easy to be remembered for everyday use. The sim-

plest recovery method is a brute force attack that can

eventually ﬁnd a correct password by generating an

exhaustive set of all possible character permutations.

Since most of the passwords are human generated,

dictionary attacks and rule-based attacks can speed up

password recovery signiﬁcantly.

User habits with respect to password strength

were studied mostly in the Internet environment. The

results of research by Florencio and Herley presented

in (Flor

encio and Herley, 2007) give an estimation of

the character of passwords utilized for Web sites. The

authors considered over a half of million users dur-

ing three months. The average observed password bit

strength was around 40 bits

. Users also share their

passwords with different sites. In average, a user uses

a single password for 5.67 sites. According to another

Quantized bit strength is computed as log

(|A|

where A is an alphabet and n is a password length.

study focused on password habits among American

consumers performed by CSID

, the average pass-

word length is about nine characters. In average, 76%

of users also reuse their passwords between different

sites. Based on this fact, we suppose, plenty of users

use their web account passwords to secure documents

as well.

How easy it is recover encrypted documents was

demonstrated by a study of security measures applied

to medical ﬁles during clinical trials (Emam et al.,

2011). Eman et al. were able to recover 93% of

passwords using available commercial tools. In an-

other study (Al-Wehaibi et al., 2011), Al-Wehaibi et

al. summarize user behavior towards password us-

age and conduct a new experiment that coincide with

previous observations. According to their data, 65%

of passwords are at most eight-character long and ap-

proximately 8 out of 10 passwords consist only of al-

phanumerical characters. Our study reveals how pass-

word recovery using brute force attack can be imple-

mented on conventional hardware and with the use of

GPU acceleration.

1.1 Contribution

This paper evaluates the resilience of password pro-

tected documents and archives against password re-

covery attacks. It shows statistical distribution of

See http://www.csid.com/wp-content/uploads/2012/

09/CS PasswordSurvey FullReport FINAL.pdf

Hranický, R., Matoušek, P., Ryšavý, O. and Veselý, V.

Experimental Evaluation of Password Recovery in Encrypted Documents.

DOI: 10.5220/0005685802990306

In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), pages 299-306

ISBN: 978-989-758-167-0

299

characters in available user password datasets and the-

oretical values of password recovery using massive

parallelism. We show experimental results of pass-

word recovery on DOC, PDF and ZIP formats. Our

approach is to show possible ways of password re-

covery parallelization and underline the potential of

GPU-based computation. Our experiments prove the

advantage of a GPU accelerated approach in compar-

ison to a single CPU computation. Experimental re-

sults were obtained by our tool Wrathion and other

available tools.

1.2 Structure of the Paper

The paper is organized as follows. The related re-

search is summarized in section 2. Section 3 discusses

the statistical analysis of characters in user passwords

and estimation of password recovery time for differ-

ent password lengths. Section 4 describes the ways of

password recovery acceleration using GPUs. Section

5 describes a process of password recovery on a tool

we developed with our team. Finally, section 6 brings

experimental results of password recovery using dif-

ferent tools and discusses the resilience of documents

and archives against brute force recovery techniques.

2 RELATED WORK

Since cryptographic algorithms consist of simple in-

teger and binary operations, they can be implemented

in the multi-processor environment using OpenCL

or CUDA programming as shown in (Marks and

Niewiadomska-Szynkiewicz, 2014). Some encryp-

tion algorithms can limit parallelism if synchroniza-

tion is needed as discussed in (An et al., 2015) where

a combination of SHA-1 hashing and AES decryp-

tion on RAR ﬁles is presented. This can be overcome

by careful division of password database as shown in

(Apostal et al., 2012). For another speed up of pass-

word recovery, it is possible to use tables with pre-

computed hash function tables, e.g., rainbow tables

(Thing and Ying, 2003).

There are also advanced automated techniques

which consider users to choose easily memorable

passwords containing existing words as discussed

in (Yampolskiy, 2006). Thus, we can use com-

plex heuristics for generating most-likely passwords,

e.g. by using a probabilistic grammar as described

in (Weir et al., 2009). Probability-based techniques

are used, e.g. by John the Ripper

, and oclHash-

cat

tools. However, as we have experimentally

See http://www.openwall.com/john/

See http://hashcat.net/oclhashcat/

proven (see section 6), even this approach is not ro-

bust enough to deal with randomly-generated pass-

words.

Non-automated approaches include mathematical

analysis, the study of side-channels, and other tech-

niques allowing to exploit the weaknesses of a given

algorithm. As shown in (Bergen and Caelli, 1990),

on WordPerfect 5.0 documents, the ciphertext-only

attack was successfully performed due to the weak de-

sign of the document format, which contained enough

known plaintext to guess the encryption key in a short

period of time.

3 STATISTICAL PASSWORD

ANALYSIS

The complexity of an attack highly depends on

a password length, and alphabet used. An ideal

password should have sufﬁcient length and include

non-alphanumerical characters. Strongest passwords

have a uniform distribution of probability in the oc-

currence of characters of the source alphabet. How-

ever, users tend to create passwords that expose some

characteristics in order to ease their memorization.

In this section, we present the results of our statisti-

cal analysis of passwords used. Such analysis cannot

be considered complete since it relies on published

datasets of leaked passwords from various services

only. For objectivity, we also consider machine gener-

ated passwords by pwgen

. This tool generates secure

but easily memorizable passwords.

Statistical analysis of user passwords was created

over following datasets

: Myspace is a password list

consisting of 37,000 items obtained from a MySpace

phishing attack in 2006; phpbb consists of 184,000

passwords stolen from phpbb.com website in 2009;

rockyou is a list of 14,000,000 of user passwords from

RockYou in-game video platform leaked in 2009; sin-

gles consists of more than 12,000 passwords from sin-

gles.org website leaked in 2010; facebook comprise

of 2,441 passwords stolen by a Facebook phishing at-

tack in 2010; pwgen1 represents a list of 1,000,000

passwords generated by pwgen; and pwgen2 con-

tains 1,000,000 passwords generated by pwgen -s

that generates passwords with uniform distribution.

Table 1 shows basic statistics about passwords

from different datasets. Interestingly, most of the

passwords consists of lower case letters with numer-

als (lc+nm). Lower case only passwords (lc) build the

second most common group regardless of the dataset.

See http://sourceforge.net/projects/pwgen/.

See http://wiki.skullsecurity.org/Passwords

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

300

Table 1: Statistical analysis of password characters, sizes and lengths.

Distribution of character classes in passwords Dataset statistics

Dataset nm lc lc+nm uc uc+nm bc bc+sc bc+nm bc+nm+sc other |A| φ length φ strength

myspace 0.72 6.47 75.92 0.24 2.87 0.16 8.42 2.38 2.8 0.02 98 8.59 41.36

phpbb 11.24 41.24 35.70 0.93 1.19 2.68 1.02 4.82 1.16 0.02 96 7.54 36.19

rockyou 16.36 25.98 42.35 1.60 2.84 1.11 3.16 2.66 3.83 0.10 711 8.75 41.36

singles 8.37 55.11 26.48 2.26 0.82 4.37 0.15 2.34 0.11 0.00 63 6.74 32.90

facebook 16.35 38.47 34.29 0.66 0.53 0.70 3.24 1.52 4.1 0.16 90 9.523 41.36

pwgen1 0 0 0 0 0 0 0 100 0 0 62 8 47.63

pwgen2 0 0 0 0 1.6 0 0 98.4 0 0 62 8 47.53

nm:numerical only passwords; lc/uc/bc:lower/upper/both case only password; sc:special characters.

Not so many passwords includes special or non-ascii

characters (bc+nm+sp), although, as many as 711

different characters were found in rockyou dataset.

An average length of passwords does not exceed ten

characters. Average bit strength is around 41 what

corresponds to (Flor

encio and Herley, 2007). Un-

like to (Al-Wehaibi et al., 2011), the usage of non-

alphanumerical passwords is about 90-97%. This is

higher than Al Wehaibi’s results. (Mazurek et al.,

2013) shows an average length of CMU students’ and

employee’s password to be 10.7 characters with the

higher occurence of nm, uc, and sc. We assume uni-

versity students and staff to be more familiar with se-

curity issues than average users of websites denoted

above. The CMU research also underlined the sig-

niﬁcance of positions in password where nm/uc/sc

are placed, the more predictable position is chosen,

the weaker the password is. (Flor

encio et al., 2014)

states, that users still frequently use weak passwords

like ’password’, or ’monkey’. About 1% of RockYou

users use ’123456’ as a password. However, as we

mentioned in section 1, 76% of users reuse passwords

between different sites, thus we assume there is a high

probability that they use the same passwords for doc-

ument protection.

Table 2 shows how much time it takes to check all

possible combinations of alphanumerical passwords

(|A| = 62) with different lengths. Speed is measured

as a number of passwords generated and veriﬁed per

second. The lowest value (veriﬁcation of 10,000 pass-

words per sec) can be easily achieved by a common

desktop computer. The highest presented speed (1 bil-

lion passwords per sec) can be achieved by medium

to large scale distributed computing platforms or su-

percomputers. The presented values show that for

lengths ≥ 9 the problem becomes intractable. For re-

duced alphabet, e.g., lc+nm class (|A| = 36), it is pos-

sible to recover passwords up to the length of 10.

Password search time can be reduced if we have

additional knowledge of password characteristics. To

gain such knowledge, various analyses are possible.

Table 3 shows top password characters from analyzed

datasets. Each value represents a percentage of the

total occurrence in passwords from the given dataset.

Although datasets employ different password policies

and various kinds of users they have a similar distribu-

tion of characters. For example, the letters ’a’ and ’i’

are the most used pair of letters in all of them, except

those generated by pwgen. To reﬁne the analysis, oc-

currences of letters in different positions in passwords

can also provide useful information. From this analy-

sis, it can be seen that some letters have a signiﬁcantly

lower probability of occurrence at certain positions.

The presented statistical evaluation gives informa-

tion usable for adjusting password recovery methods.

Once, we know the approximate characteristics of

common passwords the search method can be focused

to a subspace of the complete password name-space in

order to reduce the password recovery time. It means

that combinations of characters that are more proba-

ble to occur in a password are tested ﬁrst. However,

this does not work for all passwords. Nevertheless,

there is a signiﬁcant amount of passwords that exhibit

these properties and can be found faster.

4 GPU ACCELERATION

Fortunately, the password recovery process can be

parallelized easily. We can divide the set of posssi-

ble passwords to subsets, while the passwords from

each subset may be generated on a different compu-

tation unit. Also the password veriﬁcation process

could be parallelized. For a single computation unit

with the knowledge of the veriﬁcation value, there is

no direct dependency on other passwords generated.

This means different passwords can be veriﬁed inde-

pendently of each other. Even some encryption algo-

rithms like AES can be computed parallely (Apostal

et al., 2012). This makes the password recovery pro-

cess an ideal candidate for a parellel and/or distributed

environment.

The acceleration method we focus on in this pa-

per is the usage of GPU devices. A GPU is present

in every PC, which means no special hardware is re-

quired. With a bit of exaggeration, we can state,

Experimental Evaluation of Password Recovery in Encrypted Documents

301

Table 2: Speed of exhaustive password search.

Passwords of alphabet with |A| = 63 Number of passwords generated and veriﬁed per second

length bit strength no. of combinations 10

5 29.77 916 · 10

1 day 2.5 hours 15.25 mins 1.5 mins 9 secs 1 sec

6 35.72 57 · 10

66 days 6.5 days 16 hours 1.5 hours 9.5 mins 56 secs

7 41.68 3.5 · 10

11 years 1 year 41 days 4 days 10 hours 58 mins

8 47.63 218 · 10

692 years 69.25 years 7 years 253 days 25.25 days 60.5 hours

9 53.59 13.5 · 10

42 808 years 4 280 years 428 years 42 years 4.2 years 156 days

Table 3: Most used password characters (in %).

dataset a e i o 1 r n s l h 2 0 t m 3 c

myspace 6.70 7.18 4.57 4.83 5.79 4.28 3.94 4.19 4.07 2.54 3.11 2.43 3.39 2.69 2.15 2.82

phpbb 7.00 6.37 4.49 4.71 4.38 4.46 4.25 4.12 3.55 2.32 3.26 3.26 3.58 2.87 2.49 2.49

rockyou 7.05 5.75 4.44 4.13 5.37 3.66 3.86 3.32 3.56 1.87 4.18 4.58 2.74 2.56 3.00 2.08

singles 8.17 8.19 5.15 5.73 3.36 4.74 4.91 4.71 4.54 2.41 2.35 2.12 3.53 3.17 1.60 2.52

facebook 8.39 5.60 4.96 4.49 5.32 4.27 4.15 3.87 3.40 2.33 3.84 3.60 2.98 2.61 2.94 2.27

pwgen1 8.28 12.58 8.92 8.59 1.46 0.77 1.06 1.28 0.77 8.72 1.45 1.44 1.28 0.77 1.45 1.28

pwgen2 0.19 0.19 0.19 0.19 0.27 0.19 0.19 0.19 0.19 0.19 0.27 0.27 0.19 0.19 0.27 0.19

all 7.83 7.61 5.42 5.41 4.28 3.57 3.52 3.37 3.19 3.05 3.03 2.90 2.70 2.32 2.01 1.90

that every GPU is, in fact, a little but ”highly-parallel

supercomputer”. For example AMD(R) Tri-X R9

290X card contains 2816 stream processors

. Us-

ing OpenCL (Advanced Micro Devices Inc., 2010)

or CUDA (NVIDIA Corporation, 2012) to implement

algorithms for GPU, we can perform the password re-

covery process in a highly-parallel environment.

Independent of AMD or NVIDIA architecture, the

computation on GPU is divided into so called work-

items, while each work-item has its own private mem-

ory. A workgroup is a collection of work-items with

a collective local memory. There is also a global

memory on each GPU device which can be accessed

from the host machine (Advanced Micro Devices Inc.,

2010; NVIDIA Corporation, 2012).

GPU Accelerated computation can be used for

both password generation and password veriﬁcation.

It is also possible to perform a hybrid CPU/GPU pass-

word recovery process. The scalability potential is

high - a single host machine may contain multiple

GPU units while multiple host machines may be con-

nected together creating a computation cluster. In

such cluster, host machines can communicate by us-

ing OpenMPI

or another technique.

5 PASSWORD RECOVERY

A typical architecture of a password recovery tool as

implemented in our tool Wrathion is shown in Fig-

ure 1. This architecture includes a core module with

password generators and speciﬁc modules (crackers)

See http://www.sapphiretech.com/

See http://www.open-mpi.org/

for password recovery of each ﬁle format (e.g., DOC,

PDF, ZIP). Generation of possible passwords should

be independent on recovery ﬁle format. Both recovery

modules and password generators can support com-

putation on CPU and/or GPU.

The password recovery process is composed of

the following steps: (i) document identiﬁcation and

detection of cryptographic algorithm(s), (ii) password

generation, and (iii) password veriﬁcation.

Figure 1: Architecture of Wrathion.

5.1 Identiﬁcation of File Format

Identiﬁcation of a ﬁle format is the ﬁrst task that has to

be performed. Usually, the ﬁle format can be identi-

ﬁed from within the ﬁle by a signature. The signature

is a unique string

that is mostly speciﬁed at the be-

ginning of the ﬁle. When the ﬁle format identiﬁed, it

is necessary to parse the document to obtain metadata

related to encryption (algorithms used, etc.).

See http://www.ﬁlesignatures.net/

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

302

5.2 Password Generation

Password generation is an essential function of any

password recovery tool. The generator creates differ-

ent combinations of strings that are tested (veriﬁed) as

a functional password. It is necessary for generators

to support national alphabets, since some document

types may be encrypted with passwords consisting of

non-ASCII symbols. Based on observations from pre-

vious sections, following types of generators can be

employed for password recovery of user documents:

5.2.1 Dictionary Password Generator

The dictionary generator takes a given dictionary as a

source of possible passwords. Such approach is of-

ten successful, since many users use popular pass-

words like ”123456” or ”qwerty”, as mentioned in

(Flor

encio et al., 2014). Usage of dictionaries can

substantially reduce the password name space.

5.2.2 Brute Force Password Generator

In the case of complicated passwords, brute force

password generator has to be used. This generator

creates every possible permutation of a given alpha-

bet limited by the maximum length.

5.2.3 Rule-based Generator

As described in section 2, we can use advanced

probability-based techniques to reduce a password

name space. This involves letter prediction, di-

graph/trigraph frequency analysis, letter position

guessing, etc. Such generation may be based on a set

of heuristics or a probabilistic grammar.

5.3 Password Veriﬁcation

Password veriﬁcation procedure is usually deﬁned by

the document format inventor, and is unique to each

document format and its version. In this phase, we

use metadata obtained by the ﬁrst phase (identiﬁca-

tion). Such metadata usually include a part called

password veriﬁcation value. The veriﬁcation method

usually consists of one or more hash algorithms, pos-

sibly combined with one or more encryption algo-

rithms. The result is then compared with a veriﬁca-

tion value. When matched, the password is consid-

ered correct. In some cases, an input password has to

be combined with a salt, which is usually a particular

string located inside the document.

5.3.1 Microsoft Ofﬁce

The encryption technique in Ofﬁce 95 documents is

XOR of the cascaded password with the document

content. Based on the fact that only 16-bit key was

used, modern password recovery software can ﬁnd the

password instantly.

In Ofﬁce 97/2000 documents, password veriﬁca-

tion is performed by using 40-bit RC4 stream cipher

in combination with the MD5 hash algorithm. How-

ever, when the document gets modiﬁed and saved,

the initialization vector remains the same, which al-

lows the use of known-plaintext attack, as described

in (Wu, 2005).

In Ofﬁce XP and 2003, the default protection re-

mained the same, but an opportunity to use a custom

protection algorithm was added. Starting with Ofﬁce

2007

, AES is used for encryption, and the pass-

word veriﬁcation algorithm consists of 50,000 rounds

of SHA-1.

5.3.2 Portable Document Format

PDF documents can be secured by two passwords:

user password, and owner password. Owner pass-

word is only used to restrict selected operations within

the ﬁle, and can be ignored. User password is used for

encryption of the document.

Like Ofﬁce 97/2000, for password veriﬁcation,

PDF security revisions from 1 to 4 use the combi-

nation of RC4 and MD5. In revision 3, the MD5 is

performed 50 times, as described in (Adobe Systems

Incorporated, 2008b).

In security revision 5, password in UTF-8 encod-

ing is combined with salt and processed with a single

round of SHA-256 as described in (Adobe Systems

Incorporated, 2008a). As we have proven in section

6, this approach is weaker than in revision 4.

Security revision 6 (PDF 2.0) uses strong pass-

word veriﬁcation algorithm described by ISO 32000-

2, but this speciﬁcation is not yet publicly available.

However, a pseudocode of the algorithm has leaked

5.3.3 ZIP

The original Zip 2.0 (PKZIP) uses a very simple ci-

pher, which is also vulnerable to known-plaintext at-

tack. Password veriﬁcation is done by comparing the

8-bit value obtained in the initialization phase of the

See https://msdn.microsoft.com/en-us/library/cc31310

5.aspx

See http://esec-lab.sogeti.com/posts/2011/09/14/the-un

documented-password-validation-algorithm-of-adobe-

reader-x.html

Experimental Evaluation of Password Recovery in Encrypted Documents

303

algorithm with the lowest 8 bits of ﬁle CRC check-

sum.

Modern ZIP formats

use AES encryption mostly

with 128/192/256 bit key. Password veriﬁcation is

performed by using PBKDF2 and HMAC-SHA1 al-

gorithms.

6 PRELIMINARY RESULTS

In this section, we present experimental results of

Wrathion. For each module, we performed a mea-

surement of performance and time, and calculated the

achieved GPU acceleration in comparison with the

CPU-only method. The testing machine contained In-

tel(R) Core i7 CPU 920 @ 2.67Ghz processor, 16 GB

of DDR3 RAM, and two AMD Tri-X R9 290x GPU

cards.

We also compared Wrathion with other password

recovery tools: oclHashcat, John the Ripper 1.8.0 -

jumbo 1 (John), Elcomsoft

Advanced Ofﬁce Pass-

word Recovery 6.10 (AOPR), Elcomsoft Advanced

PDF Password Recovery 5.06 (APPR), and Elcomsoft

Advanced Archive Password Recovery 4.54 (AAPR).

Since AAPR has only CUDA-based acceleration, for

this case, we used NVIDIA GeForce GTX 660Ti in-

stead. Since oclHashcat is GPU-only, and its prede-

cessor Hashcat does not support any of the formats

analyzed, we could not determine the acceleration of

GPU.

6.1 Performance of Wrathion

First three lines of Table 4 show the performance

(perf.) in passwords per second, and GPU acceler-

ation (acc.) of Wrathion. The highest acceleration

was achieved on ZIP archives encrypted with AES.

A single-GPU recovery was 30.28 times faster than

using CPU method. With dual-GPU deployment, the

recovery was 60.56 times faster than with CPU. ZIP-

AES uses PBKDF2 for key generation in combination

with SHA-1 hash function. These algorithms offer

plenty of space for parallelization, which was utilized

efﬁciently in our OpenCL kernels. There is almost no

time difference between the password recovery of ZIP

encrypted with AES-128 and AES-256 since AES is

only the encryption algorithm, but for password veri-

ﬁcation, PBKDF2 and SHA-1 algorithms are used.

DOC and PDF revisions 3 and 4 use RC4 ci-

pher with MD5 hash function repeated multiple times.

See https://pkware.cacheﬂy.net/webdocs/APPNOTE/

APPNOTE-6.3.3.TXT

See http://www.elcomsoft.com/

Since RC4 has high memory requirements, we made

our kernels faster by putting the whole S-BOX inside

a GPU local memory. Another 12% of speedup was

achieved by storing the key in a vector inside a stan-

dard array.

In PDF revisions 3 and 4, the MD5 hash is com-

puted from the array of values for key alignment

concatenated with an ID from the document footer.

This gives Wrathion the opportunity to calculate the

hash on CPU before the password recovery process is

started and then send it to the GPU since this value

does not change during the recovery process.

6.2 Performance Comparison

Table 4 also shows the performance and GPU acceler-

ation of other tools. For each of the document formats

we selected one encryption method for chart illustra-

tion.

Figure 2 shows the performance of oclHashcat,

AOPR, John, and Wrathion. The document type cho-

sen was MS Ofﬁce DOC 97/2000. Within this old

document format, AOPR and John evinced rather a

poor performance. OclHashcat seemed to be much

more efﬁcient than AOPR, but is behind Wrathion,

possibly because its creators considered DOC format

obsolete and have not optimized their tool much for

it.

Figure 2: Performance of tools (DOC 97/2000).

Figure 3 shows the performance of AOPR, oclHash-

cat, and Wrathion on PDF 1.7 revision 4. Within

CPU password recovery, the performance difference

between AOPR and Wrathon was not excessively rad-

ical. However, with GPU acceleration, Wrathion and

oclHashcat turned out to be much more efﬁcient. For

revision 4, Wrathion and oclHashcat have similar per-

formance. However, for revision 5 (See Table 4.),

oclHashcat turned out to be faster than Wrathion.

Figure 4 compares AAPR, John, and Wrathion on

ZIP encrypted by AES-256. Unfortunately, oclHash-

Figure 3: Performance of tools (PDF R4).

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

304

Table 4: Performance (pass./s) and GPU acceleration using different tools on document formats.

ZIP AES-128 ZIP AES-256 DOC 97/2000 PDF Rev 4 PDF Rev 5

Tool perf. acc. perf. acc. perf. acc. perf. acc. perf. acc.

Wrathion (CPU) 4.3k - 4,329 - 2,985k - 143k - 7,425k -

Wrathion (1x GPU) 131k 30.34 131k 30.28 18,548k 6.21 2,622k 18.36 82,735k 11.14

Wrathion (2x GPU) 262k 60.69 262k 60.56 35,262k 11.81 4,522k 31.67 136,913k 18.44

oclHashcat (1x GPU) n/a - n/a - 7,766k - 2,596k - 144,074k -

oclHashcat (2x GPU) n/a - n/a - 15,377k - 5,121k - 286,340k -

John (CPU) 649 - 376 - 115k - n/a - n/a -

John (1x GPU) 28.7k 44.28 14.8k 39.38 861k 7.5 n/a - n/a -

Elcomsoft (CPU) 8.1k - 8.1k - 20k - 33k - 28,861k -

Elcomsoft (1x GPU) n/a - n/a - 26k 1.31 69k 2.1 n/a -

cat does not support ZIP format, and AAPR is CPU-

only. However, for CPU, AAPR showed the best per-

formance within all tools tested.

Figure 4: Performance of tools (ZIP AES-256).

6.3 Time Comparison

We also measured the real recovery times of Wrathion

and other tools. In this experiment, all tools were in

brute-force mode with the lowercase Latin alphabet

(a-z). The results for lengths from 4 to 7 characters

(from ’acz’ to ’aczqfbi’) are denoted in table 5. It is

worth to compare these results with theoretical esti-

mations from Table 2. The recovery time of shorter

passwords oscillated around 1 second while recovery

of longer passwords would take even days. In cells

marked with ’app.’ the time was approximated due to

a long duration. Cells marked with ’err’ show cases

where we were not able to succesfully recover the cor-

rect password (after many attempts) due to a possible

bug in the program leading to an error.

The results shown in Table 5 relatively correspond

to previously measured performances. Wrathion

needs at least 1 seconds since it is the interval for

communication with working threads. For encrypted

DOC, AOPR has shown an anomaly when generating

a password of 6 and more characters. Even though

we deﬁned the alphabet as lowercase Latin letters,

for some reason, the passwords were generated from

numbers, etc. As mentioned above, AAPR supports

only CUDA-based acceleration. However, even with

NVIDIA card with CUDA support, the recovery of

PDF Revision 5 always ended with a program error.

We also encountered another anomaly connected

with ZIP-AES. Using Wrathion and Elcomsoft tools,

recovery times of 128-bit and 256-bit AES were fairly

Table 5: Measurement of password recovery time using dif-

ferent tools.

DOC 97/2000

tool 4 5 6 7

Wrathion (CPU) 1.00s 1.00s 5.02s 1m 52s

Wrathion (GPU) 1.00s 1.00s 1.00s 17.00s

oclHashcat (GPU) < 1s < 1s 2.36s 40.6s

John (CPU) 0.87s 5.07s 1m 52s 40m 50s

John (GPU) 3.06s 3.61s 14.92s 4m 4s

Elcomsoft (CPU) 6.12s 25.41s err err

Elcomsoft (GPU) 5.40s 19.35s err err

ZIP AES 128-bit

tool 4 5 6 7

Wrathion (CPU) 5.06s 1m 55s 49m 40s 21h 29m

Wrathion (GPU) 2.01s 5.27s 1m 40s 42m 35s

John (CPU) 29.28s 11m 27s 4h 47m app. 6d

John (GPU) 6.05s 22.23s 7m 27s 3h 12m

Elcomsoft (CPU) 2.33s 59.77s 26m 24s 10h 23m

ZIP AES 256-bit

tool 4 5 6 7

Wrathion (CPU) 5.03s 1m 55s 49m 41s 21h 29m

Wrathion (GPU) 2.01s 6.24s 1m 41s 43m 22s

John (CPU) 54.76s 21m 54s 9h 29m app. 10d

John (GPU) 6.75s 38.01s 14m 28s 6h 17m

Elcomsoft (CPU) 2.30s 1m 1s 26m 5s 10h 27m

PDF Revision 4

tool 4 5 6 7

Wrathion (CPU) 1.03s 4.03 s 1m 30s 38m 50s

Wrathion (GPU) 1.00s 1.00s 6.00s 2m 10s

oclHashcat (GPU) < 1s < 1s 26.68s 2m 11s

Elcomsoft (CPU) 0.42s 7.66 s 3m 17s 2h 49m

Elcomsoft (GPU) 0.49s 7.12 s 3m 15s 1h 23m

PDF Revision 5

tool 4 5 6 7

Wrathion (CPU) 1.00s 1.00s 2.00s 45.04s

Wrathion (GPU) 1.00s 1.04s 1.03s 4.04s

oclHashcat (GPU) <1s <1s < 1s 2.32s

Elcomsoft (CPU) 0.67s 0.62s 0.95s 12.37s

Elcomsoft (GPU) err err err err

comparable (due to the password veriﬁcation method

used). However, with John, the recovery of 256-bit

AES took about a double time period than with 128-

bit version of AES. We do not yet know the cause of

this behavior.

Experimental Evaluation of Password Recovery in Encrypted Documents

305

7 CONCLUSION

The basic method of password recovery is an exhaus-

tive search. Despite a huge number of all permuta-

tions, currently available hardware and the strength of

common passwords make this method still relevant.

With the support of high-end GPUs, the password

recovery process can be performed in a fraction of

time in comparison with CPU-only computation. In

the area of password recovery, there is also a lot of

space for parallelization and the scalability potential

of the process is high.

In the future, we want to focus mainly on pass-

word recovery in distributed environment. We also

plan to extend Wrathion with modules for another ﬁle

formats, and with more sophisticated password gen-

erators. Finally, we want to compare our tool with

another software, e.g. AccessData Password Recov-

ery Toolkit

ACKNOWLEDGEMENTS

Research presented in this paper is supported by

project ”Modern Tools for Detection and Mitigation

of Cyber Criminality on the New Generation Inter-

net”, no. VG20102015022 granted by Ministry of

the Interior of the Czech Republic and a project ”Re-

search and application of advanced methods in ICT”,

no. FIT-S-14-2299 granted by Brno University of

Technology.

REFERENCES

Adobe Systems Incorporated (2008a). Adobe Supplement

to the ISO 32000, BaseVersion: 1.7, ExtensionLevel:

3. Adobe Systems Incorporated.

Adobe Systems Incorporated (2008b). Document manage-

ment Portable document format Part 1: PDF 1.7.

Number 32000-1:2008. International Organization for

Standardization, Geneva, Switzerland.

Advanced Micro Devices Inc. (2010). Introduction to

OpenCL programming - Training guide. Number 137-

41768-10. Advanced Micro Devices Inc.

Al-Wehaibi, K., Storer, T., and Glisson, W. B. (2011).

Augmenting password recovery with online proﬁling.

Digit. Investig, 8:S25–S33.

An, X., Zhao, H., Ding, L., Fan, Z., and Wang, H. (2015).

Optimized password recovery for encrypted RAR on

GPUs. CoRR.

Apostal, D., Foerster, K., Chatterjee, A., and Desell, T.

(2012). Password recovery using MPI and CUDA. In

Proc. of HiPS 2012, pages 1–9.

http://accessdata.com

Bergen, H. A. and Caelli, W. J. (1990). File Security in

WordPerfect 5.0.

Emam, K. E., Moreau, K., and Jonker, E. (2011). How

strong are passwords used to protect personal health

information in clinical trials? Journal of Medical In-

ternet Research, 13(1).

Flor

encio, D. and Herley, C. (2007). A large-scale study of

web password habits. In Proc. of the 16th Int. Confer-

ence on WWW, pages 657–666.

Flor

encio, D., Herley, C., and Oorschot, P. C. V. (2014). An

Administrator’s Guide to Internet Password Research.

In Proceedings of the 28th USENIX Conference on

Large Installation System Administration, LISA’14,

pages 35–52. USENIX Association.

Marks, M. and Niewiadomska-Szynkiewicz, E. (2014). Hy-

brid cpu/gpu platform for high performance comput-

ing. In Proc. of the 28th ECMS, pages 523–537.

Mazurek, M. L., Komanduri, S., Vidas, T., Bauer, L.,

Christin, N., Cranor, L. F., Kelley, P. G., Shay, R., and

Ur, B. (2013). Measuring Password Guessability for

an Entire University. In Proceedings of the 2013 ACM

SIGSAC Conference on Computer & Communications

Security, CCS ’13, pages 173–186. ACM.

NVIDIA Corporation (2012). NVIDIA CUDA C Program-

ming Guide. NVIDIA Corporation.

Thing, V. L. and Ying, H.-M. (2003). Making a faster crypt-

analytic time-memory trade-off. Advances in Cryptol-

ogy, pages 617–630.

Weir, M., Aggarwal, S., de Medeiros, B., and Glodek,

B. (2009). Password Cracking Using Probabilistic

Context-Free Grammars. In Security and Privacy,

2009 30th IEEE Symposium on, pages 391–405.

Wu, H. (2005). The misuse of rc4 in microsoft word and

excel.

Yampolskiy, R. (2006). Analyzing user password selection

behavior for reduction of password space. In Car-

nahan Conferences Security Technology, Proceedings

2006 40th Annual IEEE International, pages 109–

115.

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

306