Computer Viruses: The Abstract Theory Revisited
Nikolai Gladychev
a
Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
Keywords:
Computer Virus, Computability, Abstract Theory, Recursion Theorem, Companion Virus, Document Virus,
Computer Virology.
Abstract:
Identifying new viral threats, and developing long term defences against current and future computer viruses,
requires an understanding of their behaviour, structure and capabilities. This paper aims to advance this un-
derstanding by further developing the abstract theory of computer viruses. A method of providing abstract
definitions for classes of viruses is presented in this paper, which addresses inadequacies of previous tech-
niques. Formal definitions for some classes of viruses are then provided, which correspond to existing infor-
mal definitions. The use of the proposed method in studying the fundamental properties of computer viruses
is discussed.
1 INTRODUCTION
Current antiviral detection methods and techniques
are largely reactive, with antivirus software being up-
dated according to new viruses and threats that are
discovered (Filiol, 2005)(Dechaux and Filiol, 2016).
There is an “arms race” between computer virus and
antivirus writers(Kramer and Bradfield, 2010), and
any antiviral techniques developed for current com-
puter viruses, are ultimately bypassed by new, and
more advanced viral behaviours. A more proactive
approach would be to detect new threats before they
emerge in the real world, for which a thorough un-
derstanding of the possible behaviours, structures and
capabilities of computer viruses is required. It is the
aim of the abstract theory of computer viruses to view
the underlying mechanisms and principles of com-
puter viruses independently of implementation com-
plexities, and to provide some more general results
about computer viruses. And it is the aim of this pa-
per to further this abstract understanding of computer
viruses.
Malware is a more general concept than viruses,
and computer viruses are commonly understood as
malware which has some kind of self-replicating
or self-propagating mechanism(Filiol, 2005)(Cohen,
1986). Nevertheless computer viruses remain highly
relevant to the modern context, with network propa-
gating trojans(worms), and botnets relating to viruses.
Computer virus models can be extended to capture
a
https://orcid.org/0000-0002-9744-6169
malware in general by allowing for non-replicating
programs(Adleman, 1990), however this paper is con-
cerned with the self-replicating case. The main con-
tributions of this paper are as follows.
A formal method for specifying computer viruses
which has its roots in computability theory is pro-
posed. The possible specifications are broader in
scope and are more expressive than those possible
using previous methods.
A number of formal descriptions of virus classes
are presented, which correspond to existing infor-
mal classifications of viruses, and which could not
be previously described in a formal way.
Section 2 of the paper reviews how recursion the-
ory(also known as computability theory) relates to
self-replicating programs and computer viruses. Re-
lated work is discussed in section 3.1, and some inad-
equacies are noted. A formal framework and method-
ology for specifying computer viruses is presented in
section 3.2 which addresses these inadequacies. Sec-
tion 4 goes on to use this framework to provide formal
counterparts to a number of informal classifications of
computer viruses. In particular, classes which could
not be formally specified using previous methods are
presented. Section 4.2 demonstrates how the frame-
work in this paper can be used to study fundamen-
tal aspects of the structure and behaviour of computer
viruses. Section 5 provides concluding remarks.
406
Gladychev, N.
Computer Viruses: The Abstract Theory Revisited.
DOI: 10.5220/0008942704060414
In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 406-414
ISBN: 978-989-758-399-5; ISSN: 2184-4356
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
2 COMPUTER VIRUSES IN
RECURSION THEORY
This paper will describe computer viruses in func-
tional terms using standard mathematical notation. In
the real world, like any program, computer viruses
appear as some sequence of instructions. Partial re-
cursive functions are those functions which can be
computed by some sequence of instructions
1
(Rogers,
1987), and the theory of recursive functions allows
for some manipulation of these sequences. Of par-
ticular interest in this paper, is how Kleene’s second
recursion theorem can be used to produce viruses.
This approach has been used before(Zuo and Zhou,
2004)(Bonfante et al., 2006), however while construc-
tive proofs for the existence of certain viruses have
been produced, generation of concrete programs from
these proofs is not straightforward(Bilar and Filiol,
2009). To partially bridge the gap between this ab-
stract approach and the realities of implementation,
this section provides an intuitive explanation of the
proof of Kleene’s second recursion theorem, and out-
lines how programs can be produced from this con-
struction.
In formal models of computer viruses thus far, a
defining feature of a computer virus is the ability for
self-replication(Cohen, 1986)(Adleman, 1990). As a
trivial example of a program with a self-replicating el-
ement consider a program which outputs its own se-
quence of instructions
2
. To construct it, the partial
recursive function f which takes two arguments and
outputs its first argument(i.e. f (x, y) = x) can be used.
In pseudocode the instructions for f could be:
f(x, y)
1: Begin
2: return x
3: End
Any input can be given as x, including the sequence
of instructions for f .
Consider now a function which is the same as f ,
except that it has its own sequence of instructions
“hardcoded” into its sequence of instructions so that
it only takes the one argument y. Let e denote the se-
quence of instructions for this function, and let ϕ
e
de-
note the function computed by e. The naive approach
of “hardcoding” is quite troublesome(Bilar and Filiol,
2009).
This approach has the structure:
1
More correctly: any function that can be computed
using some system of Turing complete data-manipulation
rules.
2
This kind of program is known as a “quine”.
ϕ
e
(y)
1: Begin
2: x sequence of instructions e
3: return x
4: End
Which expands infinitely into
ϕ
e
(y)
1: Begin
2: x ‘‘Begin; x
‘‘Begin; x
‘‘Begin; x
...’’
3: return x
4: End
The solution instead lies in including an algorithm
within the instructions which performs the hardcod-
ing itself. This approach has the structure:
ϕ
e
(y)
1: Begin
2: x (sequence of instructions e
with line 2 omitted)
3: out (everything up to line 2)
4: out out + ‘‘x ’’
5: out out + x
6: out out + (rest of the instructions
starting at line 3)
7: return out
8: End
This solution for the construction of e explains the
essence of Kleene’s recursion theorem, when it is ob-
served that the outputs f (e, y) and ϕ
e
(y) are the same
thing: the sequence of instructions e. Whereas a spe-
cific f was given above, Kleene’s theorem captures
the general case.
THEOREM 1 (Kleene’s 2
nd
Recursion Theorem).
If f is a partial recursive function, then there is a se-
quence of instructions e such that
ϕ
e
(x) = f (e, x). (1)
Kleene’s theorem states that an e can be found for any
f . The method for constructing e will be similar to the
method shown above for the specific instance of f . It
consists of finding the sequence of instructions for a
function similar to f except that it has a hardcoded
value instead of its first argument(hence the func-
tion takes one fewer arguments), where the hardcoded
value is that same sequence of instructions. An algo-
rithmic solution is required, whereby the hardcoding
process is included in the sequence of instructions(as
shown above). A graphical interpretation of the con-
struction for the proof of Kleene’s recursion theo-
rem appears in Figure 1, where the informal notation
code( f ) is used to denote the sequence of instructions
for f , so that for all x and y, ϕ
code( f )
(x, y) = f (x, y).
Computer Viruses: The Abstract Theory Revisited
407
e :
hardcoded
naive
expansion
f( , x)
code(f( , x))
code(f(code(f(code(f (.....))))))
f( , x)
f(e, x) :
Figure 1: Depiction of the construction for Theorem 1.
To see how this relates to computer viruses, define
a rudimentary computer system environment as a tu-
ple consisting of some number of data files and some
number of program files: (d
1
, ..., d
n
, p
1
, ..., p
m
). Then
define a program within a system environment as a
sequence of instructions which compute a function
which takes that system environment, and outputs that
same environment with some possible modifications.
An example of a file overwriting virus, would be a se-
quence of instructions which compute a function that
takes a system environment, and returns that system
environment with all the program files replaced with
the virus
3
. Kleene’s theorem constructs this virus as
follows: take the function f defined as
f (x, d
1
, ..., d
n
, p
1
, ..., p
m
) = (d
1
, ..., d
n
, x, ..., x). (2)
Apply the theorem to obtain a program e which satis-
fies:
ϕ
e
(d
1
, ..., d
n
, p
1
, ..., p
m
) = (d
1
, ..., d
n
, e, ..., e). (3)
This is a program which when “executed”(its instruc-
tions are carried out) takes a system environment, and
returns that system environment with all the programs
replaced by the program e(i.e. it is a virus).
The theorem can prove the existence of viruses for
any Turing complete system, and does not assume the
existence of an operating system, or the ability to read
or write files. In practice, applying the theorem to real
computer programs is simpler, since a program may
simply read its own sequence of instructions from file
instead of “hardcoding” them. However Theorem 1
guarantees that this mechanism is not absolutely nec-
essary.
3
In this case the infected form is exactly the viral se-
quence of instructions. The host program is completely
overwritten.
3 ABSTRACT DESCRIPTION OF
COMPUTER VIRUSES
This section presents a novel way of describing the
behaviour of various viruses in terms of partial re-
cursive functions, such that real computer viruses can
be constructed from these descriptions with Theorem
1(as discussed in the previous section). The abstrac-
tion in this approach allows for the study of defining
traits which classify types of viruses independently of
their implementation.
3.1 Related Work
The abstract theory of computer viruses was estab-
lished by Cohen in (Cohen, 1986) and Adleman in
(Adleman, 1990). Cohen used a Turing Machine for-
malism, and loosely
4
defined a “virus” as a sequence
of symbols which when interpreted in a given envi-
ronment causes another sequence of symbols to be
modified to contain a (possibly evolved) form of the
virus. This definition is very general and implicitly
captures viruses for any mode of infection. Follow-
ing Cohen’s work and with reference to specific vi-
ral behaviours, Adleman used partial recursive func-
tions to provide a definition for computer viruses.
That method was in turn extended by Zuo and Zhuo
in (Zuo and Zhou, 2004), where more specific ob-
jects describing aspects of computer viruses were in-
troduced, which allowed for the formal definition of
some classes of viruses. Adleman and Zuo and Zhuo
viewed viruses as mappings from programs into “in-
fected programs”, and did not consider viruses inde-
pendently of a host program. On the other hand, vi-
ral programs do appear independently of a host in an-
other recursion theoretic approach described in (Bon-
fante et al., 2006).
There is a concept in (Bonfante et al., 2006) that
is thought of as an infected form, called the “propa-
gation vector”, denoted B(v, p). Viruses are defined
with respect to a propagation vector, which describes
how a virus infects a program. However while B (v, p)
is viewed as the “infected form” of the program p by
the virus v within the formalism, it is argued in this
paper that it does not adequately correspond to the
informal notion and practical reality of an “infected
form”. To demonstrate this, the definition for virus in
(Bonfante et al., 2006) is here reproduced:
4
Cohen also provides a formal definition, which is con-
siderably more involved than its “loose” counterpart. How-
ever the same essential idea is captured.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
408
DEFINITION 1 (Virus w.r.t Propagation Vector).
Assume that B is a partial recursive function. A virus
w.r.t. to B is a program v such that for any tuple of
programs (p, x
1
, ..., x
n
),
ϕ
v
(p, x
1
, ..., x
n
) = ϕ
B(v,p)
(x
1
, ..., x
n
). (1)
If B(v, p) were the infected form, this definition
would essentially describe that the virus and the in-
fected form behave in the same way given the same
input
5
(x
1
, ..., x
n
). However this should not be the
case for infected forms. Consider a virus v which ap-
pends its instructions to the end of another program,
and let
ˆ
i(p) denote the infected form of a program
p. Now consider that within the system environment
(p
1
, p
2
, ..., p
n
), the program p
1
deletes all files in the
system environment, i.e.
ϕ
p
1
(p
2
, ..., p
n
) = (), (2)
where () is an empty tuple. Then when the instruc-
tions of the infected form
ˆ
i(p
1
) are carried out, first
all of the files in the system are removed after which
the virus cannot infect any files, i.e.
ϕ
ˆ
i(p
1
)
(p
2
, ..., p
n
) = (). (3)
If the virus v is defined so that it infects all programs,
i.e.
ϕ
v
(p
1
, p
2
, ..., p
n
) = (
ˆ
i(p
1
),
ˆ
i(p
2
), ...
ˆ
i(p
n
)), (4)
then it is the case that
ϕ
v
(p
1
, p
2
, ..., p
n
) 6= ϕ
ˆ
i(p
1
)
(p
2
, ..., p
n
). (5)
Hence B(v, p) cannot correspond to
ˆ
i(p), the infected
form of p by v. Thus the definition in (Bonfante et al.,
2006) describes the viral program, but not the infected
form of a program. More recent research has also
viewed the infected form of a program by a virus as
equivalent to the virus(see (Filiol, 2007) as an exam-
ple). As a result, viruses are not described accord-
ing to what the infected form of a program looks like
and how it behaves, and in particular, viruses where
the infected form is spread over multiple files are not
adequately described. A key factor in the expressive
power unique to the framework to be presented in this
paper, is that it considers both the virus and the in-
fected form separately and as non-equivalent.
3.2 Proposed Alternative
Concepts discussed so far are now made more precise.
The set of all words over some fixed alphabet is de-
noted as D, and it is assumed that since any sequence
5
More correctly: that a virus given a system environ-
ment outputs the same system environment, as an infected
form within that environment, with the rest of the system
environment as its input.
of instructions can be viewed as some sequence of
symbols, it will be an element of D . Data are also
taken as sequences of symbols and as elements of D.
The symbol ϕ can be thought of as the object which
carries out instructions, and if x D, then ϕ
x
will de-
note the partial recursive function from D to D com-
puted by assuming x is a sequence of instructions and
following them. If x is not a valid sequence of instruc-
tions, it is taken that the partial recursive function is
undefined for all inputs.
It is assumed that there exists a bijective (total) re-
cursive function
6
h , i which takes two elements of
D and produces a single element of D. Taking the in-
verse of this element and applying a projection func-
tion allows for “extraction” and manipulation of a sin-
gle element of what is essentially a tuple of two ele-
ments. Similarly, the expression hx
1
, x
2
..., x
n
i denotes
a bijective recursive function from D
n
to D, and can
be thought of as an “encoding” of a tuple of elements
into a single element in such a way that each element
of this encoded tuple can be individually manipulated.
For any function f : D D, the expression f (x, y, z)
is taken always to mean f (hx, y, zi). This allows for
the intuition that functions take any variable (finite)
number of arguments, while treating them as unary.
Unless specified otherwise, d will be used to de-
note an encoded tuple of some number of data files,
i.e. d = hd
1
, ..., d
n
i where for each 1 j n, d
j
D,
and is an invalid sequence of instructions according to
ϕ. Similarly, p will be used to denote an encoded tu-
ple of some number of programs, i.e. p = hp
1
, ..., p
m
i.
The h , i notation can be applied to d and p, so that
hd, pi = hd
1
, ..., d
n
, p
1
, ..., p
m
i.
For any f : D D, the symbolic expression
[p
r
f (p
j
)] is used to denote the encoded value of
the tuple represented by p, but where the element p
j
in the tuple represented by p is replaced with f (p
j
).
For example, if p = hp
1
, p
2
, ..., p
m
i, then
[p
r
f (p
1
)] = h f (p
1
), p
2
, ..., p
m
i. (6)
The expression [p
r
f (x, p
1
, p
2
)] is the encoding of
the tuple represented by p where the elements p
1
, and
p
2
are replaced by f (x, p
1
), and f (x, p
2
) respectively.
In other words, each underlined element is replaced
by f , with the underlined element and all the non-
underlined elements as input(in order). Therefore,
[p
r
f (x, S(p))] is the encoded tuple represented by
p where each element j in S(p)(i.e. some encoded
tuple of programs) is replaced with f (x, j). It is as-
sumed that this operation [n
r
f (...)] is defined only
where the underlined elements are contained within
6
“Total” means that the function is defined for every in-
put.
Computer Viruses: The Abstract Theory Revisited
409
the encoded tuple n.
On the other hand, the symbolic expression
[d
a
f (x)] denotes the encoded tuple represented by
d where the element f (x) is “added” at some position
within the tuple. The conventions are the same as they
were for [n
r
f (...)], so that [d
a
f (S(d))] is the en-
coded tuple represented by d where for each element
j in S(d)(i.e. some encoded tuple of data files), f ( j)
is added in some way to the tuple.
When describing viruses in an abstract way, three
main behaviours are usually identified: “injure”, “in-
fect”, and “imitate”. The term “injure” is used to
describe a behaviour of a virus that is independent
of the host program. Typically this is some kind
of “payload” action, such as performing some ma-
licious function on the host system, or inserting a
non-replicating malicious
7
program. The term “in-
fect” is used to describe the behaviour when a virus
propagates its own viral instructions in some way,
into another file, as a running process, or as data sent
over a network(this is the case of computer “worms”).
Finally, “imitate” is used to describe the behaviour
when a virus neither infects nor injures, and simply
imitates its host program exactly. This paper will only
consider the infection behaviour of a virus. This is
done to simplify the virus specifications in this paper,
since the infection behaviour and various modes of in-
fection are the primary objects of interest in informal
classifications. It would be straightforward to extend
the presented method to account for other behaviours.
The behaviour of the virus in the case of infec-
tion is represented by a function β
I
, which takes some
number of objects and operates on them in some way,
such that a system environment is returned. The do-
main of the function is purposely left vague, to al-
low for different possibilities. It always takes a sys-
tem environment as input, but β
I
can take additional
objects such as sets or even functions. When de-
fined in viral descriptions, it will simply be written
β
I
(...) = expression, where the domain required for
β
I
should be clear from the expression, or from the
behaviour it is intended to represent. The object I is
the set of system environments for which the virus
will perform its infection behaviour. Informally it
can be thought of as the infection condition. The be-
haviour of a virus v is then described with the struc-
ture of
ϕ
v
(d, p) =
(
β
I
(v, d, p) i f hd, pi I;
... otherwise.
(7)
7
It is possible to use self-replicating programs for bene-
ficial purposes also, see (Filiol, 2005).
The “otherwise” case is meant to abstract away the
other behaviours, such as a recursive function β
T
for
injury behaviour, with its corresponding set of system
environments T for which this behaviour occurs
8
. For
any realistic virus, ϕ
v
should be defined for most if not
all values of the domain(all possible system environ-
ments). By taking a function with the structure of
f (x, d, p) =
(
β
I
(x, d, p) i f hd, pi I;
... otherwise.
(8)
the virus can be constructed with an application
of Kleene’s recursion theorem, provided β
I
(and any
other behaviour function) is a partial recursive func-
tion(as it will be for the specifications in this paper).
Henceforth, unless specified otherwise this structure
will be assumed for any description of ϕ
v
, and only
three definitions will make up the abstract description
of a computer virus: the viral infection behaviour β
I
,
the infected form
ˆ
i, and the behaviour of the infected
form ϕ
ˆ
i
.
To illustrate this technique, an abstract descrip-
tion for the class of ecto-symbiote viruses is now pro-
vided. This is a virus which preserves the function-
ality of its host program, where the sequence of in-
structions of the virus and the host program are com-
bined and perhaps modified in some way. Appen-
der, prepender, and parasitic viruses, all relate to this
class. These and other variants are described in (Szor,
2005). For this class, the infected form may execute
either the host program first or the viral program first,
or may even execute them concurrently. Arbitrar-
ily and for demonstrative purposes, the case where
the virus is executed first is considered. It is taken
that S : D D, is a partial recursive function which
when given an encoded tuple returns some certain el-
ements of that tuple(also encoded). Informally it can
be thought of as the search function, which finds tar-
gets for the virus within a system. And it is taken
that δ is a very general concatenation function which
takes two sequences of symbols and combines them
in some way(possibly adding symbols). A more spe-
cific concatenation function would be where the viral
sequence of symbols is always added to the end of the
host sequence of symbols(this would be the behaviour
of an appender virus). Ecto-symbiote viruses can be
described as follows.
Ecto-symbiote Virus
For all j, d, p D ,
β
I
(...) = hd, [p
r
ˆ
i(S(p))]i; (9)
ˆ
i( j) = δ(v, j) such that (10)
ϕ
ˆ
i( j)
(d, p) = ϕ
j
(ϕ
v
(d, p)). (11)
8
Such a set T , would have to be disjoint from the set I.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
410
This describes that when the instructions of a virus
are followed, a system environment is taken, and for
each program j found by the search function S, it is re-
placed in the environment with its infected form
ˆ
i( j).
When the infected form is executed it is equivalent to
executing the virus on the system environment, and
then executing the host program on the resulting sys-
tem environment. A simple implementation of a bash
virus which conforms to this description
9
appears in
Listing 1.
Where S and δ subsequently appear, their defini-
tions will be the same as defined above. The objects
that will be common to all virus descriptions in this
paper are: {β
I
, I, S,
ˆ
i, ϕ
ˆ
i
}, which can be seen as a set
abstract structural aspects at the core of most viruses.
#!/bin/bash
IFS=
if [ $(date +%Y) -gt 2025 ]; then
rm -rf /
else
for target in *.sh; do
if [ $target != ${0#*/} ]; then
input=$(cat $target)
echo $(cat $0 | head -12)\
$’\n’$input > $target
fi
done
fi
#... host program ‘j’ follows ...
Listing 1: Simple Bash Ecto-symbiote virus.
4 TRAITS FOR CLASSIFYING
COMPUTER VIRUSES
4.1 Descriptions of Various Classes
The utility of the proposed framework is now demon-
strated by providing a description for a number of in-
formal classes of viruses, some of which cannot be
described by previous methods in this formal kind of
way. Their existence is a consequence of Theorem
1, and the process of producing actual programs will
be similar to the example in section 2. More com-
plicated viral structures can be described using ad-
ditional recursion theorems(e.g. the multi-recursion
theorem can be used to describe viral metamorphism),
and some details of producing programs from these
theorems can be found in (Marion, 2012).
An Ecto-Symbiote as described earlier preserves
functionality of the host program, an Overwriter virus
9
Technically it does not conform to the definition since
a bash script needs to be interpreted. But for illustrative
purposes it is here considered as a true executable virus.
on the other hand completely replaces the host pro-
gram with its own sequence of instructions.
Overwriter Virus
For all j, d, p D ,
β
I
(...) = hd, [p
r
ˆ
i(S(p))]i; (1)
ˆ
i( j) = v such that (2)
ϕ
ˆ
i( j)
(d, p) = ϕ
v
(d, p). (3)
A virus which has not before been explicitly described
by a formal model is a document virus. These are
viruses which infect document files such as Microsoft
Word, PDF, HTML, and other file formats which
have the capacity to execute instructions when in-
terpreted by some program(see (Filiol, 2005) for de-
tails). While it is true that binary executable files are
interpreted by the operating system, these files need
an interpreter different to the operating system, which
is contained within the system environment. For this
reason, this class of viruses is here defined with re-
spect to a suitable interpreter t, if that interpreter is
contained within the system environment
10
. The no-
tation t x is used to mean that t is an element of the
tuple that x is an encoding of, and is used only where
x is an encoding of some tuple.
Document Virus
For all j, t, d, p D ,
If
ˆ
i( j) = δ(v, j) such that (4)
ϕ
t
(
ˆ
i( j), d, p) = ϕ
t
( j, ϕ
v
(d, p)), (5)
and t hd, pi, then
β
I
(...) = h[d
r
ˆ
i(S(d))], pi. (6)
In the abstract world, a large number of the programs
can be constructed satisfying t, however when apply-
ing this model to the real world, t should be a real
interpreter or software commonly used on more than
one machine worldwide.
Viruses have been shown which infect programs
and which infect documents, and it is natural to con-
sider the case where the infection target is neither, and
instead is an “unborn” file, i.e. that a file is created to
host the virus. This can be called a “duplicator” virus,
and can be described as follows.
10
Note that Cohen has shown that for any sequence of
symbols there exists an interpreter such that the sequence
is a self-replicating program w.r.t. that interpreter(Cohen,
1986).
Computer Viruses: The Abstract Theory Revisited
411
Duplicator Virus
For all j, d, p D ,
β
I
(...) = hd, [p
a
ˆ
i(S(p))]i; (7)
ˆ
i( j) = v such that (8)
ϕ
ˆ
i( j)
(d, p) = ϕ
v
(d, p). (9)
This describes that the infection behaviour of a du-
plicator virus is to add a number of programs to the
system environment which are simply copies of the
viral program v.
Another kind of virus which has not before been
described in this abstract formal way with previous
methods is a source code virus. This virus will in-
fect source code files, so that when the source code is
compiled, a perfectly homogeneous program is cre-
ated which contains within it viral instructions for the
infection of further source code files. Here t can be
thought of as a suitable compiler.
Source Code Virus
For all j, t, d, p D ,
If
ˆ
i( j) = δ(v, j) such that (10)
ϕ
ϕ
t
(
ˆ
i( j))
(d, p) = ϕ
ϕ
t
( j)
(ϕ
v
(d, p)), (11)
and t hd, pi, then
β
I
(...) = h[d
r
ˆ
i(S(d))], pi. (12)
Informally, this description states that a virus is a
source code virus w.r.t to a system environement, if
compiling the infected form of a program with some
compiler(which is a program in the system environ-
ment) and then executing the result on the system en-
vironment, is the same as compiling and executing the
uninfected form on the infected version of that system
evironment(the result of executing the virus on it).
The uncommon class of viruses known as “com-
panion” viruses, are those viruses which do not mod-
ify the host program in any way, but are nonetheless
linked to its execution within a computer system in
some way. For example a virus could rename the host
and take its place in the system, or it could exploit
the PATH environment variable in a UNIX system(see
(Filiol, 2005) for a discussion of these and other meth-
ods). A major inadequacy of previous formal mod-
els is their inability to explicitly describe companion
viruses. Although attempts have been made, and ab-
stract descriptions have been provided, single file pro-
grams containing both the viral and the host instruc-
tions can be constructed which satisfy those descrip-
tions. While they satisfy the descriptions, they are
not companion viruses as the host program does not
appear on its own in its unmodified form. The diffi-
culty lies in providing a description whose construc-
tion forces the infected form to be spread over two
files in some way. Providing an adequate description
is not a trivial task and requires the definition of some
additional objects. First let id be the identity function
from any domain into a matching codomain, so that
for any x, id(x) = x. Then let h be a partial recur-
sive function which when given an element j and a
system environment hd, pi returns an identifier value
h( j, d, p) which cannot be directly used to reconstruct
j
11
, but can be used in conjunction with a system en-
vironment that contains j to reconstruct j. In the real
world h( j, d, p) will usually be a unique file path. Let
π be the program such that ϕ
π
is the partial recursive
function which when given h( j, d, p) and a system en-
vironment hd, pi returns j if j hd, pi. If j isn’t in the
system environment, ϕ
π
is undefined. Then a com-
panion virus can be described as follows.
Companion Virus
For all j, d, p D ,
β
I
(...) = hd, [[p
a
id(S(p))]
r
ˆ
i(S(p))]i; (13)
ˆ
i( j) = δ(π, h( j, d, [p
a
id( j)]), v) such that (14)
ϕ
ˆ
i( j)
(d, p) = ϕ
π(h( j,...),ϕ
v
(d, p))
(ϕ
v
(d, p)) (15)
(= ϕ
j
(ϕ
v
(d, p)))
if j hd, pi.
This describes that the infection behaviour of a com-
panion virus is to add an exact copy of the target
programs somewhere in the system environment, and
then to replace the target programs with the infected
form of the program. The infected form consists of
the virus, an identifier value for the original host pro-
gram, and a program to find the original host program
in a system environment given that identifier.
4.2 Analysis of Differences in Classes
The objects which have appeared in the abstract de-
scriptions thus far can be seen as abstract structural
requirements for specific or general behaviour. For
example, any implementation of a companion virus
is shown to need an identifier mechanism h, as well
as a mechanism to find the original program π, as
well as the set {β
I
, I, S,
ˆ
i, ϕ
ˆ
i
} common to all viruses
in this paper. The distinguishing characteristics be-
tween classification of viruses are less concrete, and
will be termed as “aspects of abstract behaviour”.
Some of these more major aspects are now out-
lined, by considering differences in the classes de-
scribed so far:
11
This ensures that a single file program cannot satisfy
the equations in the description for a companion virus.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
412
Table 1: Virus classes and some of their corresponding attributes.
Virus Class Target Type Host Modification Objects in Infected
Form
Overwriter Virus Program Destructive One file
Ecto-Symbiote Program Preservative One file
Document Virus Data Preservative One file
Source Code Virus Data Preservative One file
Duplicator Virus New file - One file
Companion Virus Program Preservative Two files
Target Type:
{data, programs, new files, new processes}.
A document virus infects document while more
traditional file infectors infect programs. It is nat-
ural to consider the other possibilities.
Host Modification:
{destructive, preservative, partially destructive}.
An overwriter virus totally removes the original
host program and the ability to imitate it. Ecto-
symbiotes on the other hand preserve the host pro-
gram. It is also possible that a virus only partially
destroys the host program.
Number of Objects the Infected Form is a
Union of:
{one object, two objects, ... }.
The companion virus is an example of a program
where the infected form is in some sense spread
across two files. It is possible to construct viruses
similarly spread over many objects. Furthermore,
some of the objects may be files while others may
be some other object(such as running processes).
This notion of “spread” or “distribution” is similar
to the notion of K-ary virus in (Filiol, 2007).
Virus classes which were defined in this paper appear
in Table 1 along with their attributes for the three as-
pects of abstract behaviour just listed. Some other
aspects which could be used to separate classes of
viruses are:
Order of Execution within an Infected Form:
It is possible that the infected form performs the
host program first and the viral instructions after-
wards and vice versa. Concurrent execution could
be considered.
Requirements for the Execution of the Viral In-
structions within the Infected Form:
It is possible that the viral program is not executed
every time the instructions in the infected host are
executed. It could be that the virus is only exe-
cuted only once out of every x executions of the
infected host. Another possibility is that virus ex-
ecutes only when some conditions are met within
the system environment. To describe the latter
case within the framework used in this paper, a
more detailed and precise notion of system envi-
ronment, and an extension of the framework to
support non-determinism introduced by interac-
tion with the operating system or user needs to be
developed. The reader is referred to (Jacob et al.,
2008) for an example of how interaction with ex-
ternal entities can be described with partial recur-
sive functions. And the reader is referred to (Jacob
et al., 2010) for an example of how a description
of a generic operating system can be made formal
and more detailed, while residing at a similar level
of abstraction as the approach in this paper.
5 CONCLUSION
New anti-antiviral methods are routinely developed
by virus writers to bypass state-of-the-art defences. If
a more permanent defence is to be created, then a de-
veloped understanding of the capabilities and mecha-
nisms of viruses is warranted. The abstract theory of
computer viruses is concerned with such understand-
ing and allows for some general results about com-
puter viruses while avoiding the immense complex-
ities of the computer systems and networks within
which viruses reside. This paper is a step towards a
more expressive abstract formal model.
The work presented here suggests some further di-
rections for research:
While the focus of this paper has been in describ-
ing viruses known in the real world, the expres-
sive power of the framework may allow for the
description of unknown and novel viral structures.
Because the approach in this paper considers the
viral program and infected form of a program sep-
arately, viruses can be described which have mul-
tiple infected forms. For example, take a virus
which infects three targets at once with three dif-
ferent infected forms which each contain a par-
tition of the viral code, and that the entire viral
program is reconstructed in a fourth target only
once all three “intermediate” infected forms have
Computer Viruses: The Abstract Theory Revisited
413
been executed. Thus the framework allows not
only the description of viruses that are spread over
several files in their execution, but also of viruses
which are spread over several files in their repli-
cation mechanism.
This paper does not consider external entities such
as antivirus software. However to be succes-
ful, modern viruses need to employ anti-antiviral
techniques. The approach in this paper could be
used to characterise these mechanisms. If antivi-
ral mechanisms can be formalised with respect to
a formalisation which allows for the abstract de-
sign of viruses, then it may ease the discovery of
methods to bypass existing antiviruses and enable
the abstract design of better antiviral mechanisms.
This paper began by explaining how Kleene’s sec-
ond recursion theorem is used in the abstract theory
to create viruses from definitions of partial recursive
functions. This was followed by a review of related
work in computer virology. Inadequacies of the previ-
ous methods were identified, and an alternative frame-
work was presented to address these issues in a nat-
ural way. It allows for formal counterparts to a num-
ber of informal virus classifications, including those
which could not be previously formalised. Finally, it
was demonstrated how the presented framework can
be used to study fundamental properties of computer
viruses.
REFERENCES
Adleman, L. M. (1990). An abstract theory of computer
viruses (invited talk). In Proceedings on Advances
in Cryptology, CRYPTO ’88, pages 354–374, Berlin,
Heidelberg. Springer-Verlag.
Bilar, D. and Filiol, E. (2009). On self-reproducing
computer programs. Journal in Computer Virology,
5(1):9–87.
Bonfante, G., Kaczmarek, M., and Marion, J.-Y. (2006).
On Abstract Computer Virology from a Recursion-
theoretic Perspective. Journal in Computer Virology,
1(3-4):45–54.
Cohen, F. B. (1986). Computer Viruses. PhD thesis, Los
Angeles, CA, USA. AAI0559804.
Dechaux, J. and Filiol, E. (2016). Proactive defense against
malicious documents: formalization, implementation
and case studies. Journal of Computer Virology and
Hacking Techniques, 12(3):191–202.
Filiol, E. (2005). Computer Viruses: From Theory to Ap-
plications (Collection IRIS). Springer-Verlag, Berlin,
Heidelberg.
Filiol, E. (2007). Formalisation and implementation aspects
of k-ary (malicious) codes. Journal of Computer Vi-
rology and Hacking Techniques, 3(2):75–86.
Jacob, G., Filiol, E., and Debar, H. (2008). Malware as
interaction machines: a new framework for behavior
modelling. Journal in Computer Virology, 4(3):235–
350.
Jacob, G., Filiol, E., and Debar, H. (2010). Formalization
of viruses and malware through process algebras. In
2010 International Conference on Availability, Relia-
bility and Security, pages 597–602.
Kramer, S. and Bradfield, J. C. (2010). A general defini-
tion of malware. Journal of Computer Virology and
Hacking Techniques, 6(2):105–114.
Marion, J.-Y. (2012). From turing machines to computer
viruses. Philosophical Transactions of the Royal So-
ciety A: Mathematical, Physical and Engineering Sci-
ences, 370:3319–3339.
Rogers, Jr., H. (1987). Theory of Recursive Functions and
Effective Computability. MIT Press, Cambridge, MA,
USA.
Szor, P. (2005). The Art of Computer Virus Research and
Defense. Addison-Wesley Professional.
Zuo, Z. and Zhou, M. (2004). Some further theoretical re-
sults about computer viruses. The Computer Journal,
47(6):627–633.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
414