Loop-loop Interaction Metrics on RNA Secondary Structures

with Pseudoknots

Michela Quadrini and Emanuela Merelli

University of Camerino, via Madonna delle Carceri, Camerino, Italy

Keywords:

Structure Comparison, RNA Secondary Structures, Interaction Distance, Pseudoknotted Structure, Loop.

Abstract:

Many methods have been proposed in the literature to face the problem of RNA secondary structures compari-

son. From a biological point of view, most of these methods are satisfactory for the comparison of pseudoknot

free secondary structures, whereas the problem of pseudoknotted motifs comparison has not been solved yet.

In this paper, we propose loop-loop interaction metrics, a new measure able to compute the distance of two

pseudoknotted secondary structures by comparing loops and their interactions. The new measure is deﬁned

for RNA molecules whose structural and biological information is represented as algebraic expressions of

hairpin loops, so that each RNA secondary structure can be represented as a word, which describes the in-

teractions among loops and uniquely deﬁnes the intersection set, the set of pairs of loops that cross. Hence,

the interaction metrics is deﬁned as the symmetric set difference applied to the intersection sets of molecules.

To illustrate how to apply the proposed methodology, we compare two RNA molecules, PKB66 and PKB10,

extracted from Pseudobase++ database. To test the validity of the measure, we evaluated the evolutionary

conservation of the pseudoknot domain of Vertebrate Telomerase RNA.

1 INTRODUCTION

Ribonucleic acid (RNA) is a linear polymer of nu-

cleotides arranged in a sequence referred to as a pri-

mary structure. This sequence is made of four dif-

ferent types of nucleotides, known as Adenine (A),

Guanine (G), Cytosine (C) and Uracil (U). Such nu-

cleotides are linked together by phosphodiester bonds

in a way that the orientation can be established ac-

cording to the polarity 5

to 3

of the molecule.

Neutralization of the molecule determines the initial

event of the folding process, which generates com-

plex three-dimensional shapes (Dill, 1990), (Ferr

D’Amar

e and Doudna, 1999). During such process

each nucleotide can interact at most with one other

nucleotide establishing a hydrogen bond. In this

work, the phosphodiester bond between two consec-

utive nucleotides is referred to as a strong interac-

tion, while the relations dynamically created during

the folding process are called weak interactions. Both

interactions are chemical bonds: the latter, in con-

trast to the former, are weak bonds that can be eas-

ily broken, and their formation is subject to restric-

tions. In fact, each nucleotide can form a base pair by

interacting with another one performing the Watson-

Crick base pairs (G-C and A-U) and wobble base

pairs (G-U). In 2−dimensions, the folding process

can perform many RNA secondary structures; it de-

pends on the free energy of RNA conﬁgurations. The

RNA secondary structure is composed of ﬁve basic

structural elements namely hairpins, bulges, internal

loops, multi-loops and helixes (or stacks). Each struc-

tural element is generated when at least one base pair

is performed. Thus, each of them is characterized by

strong and weak interactions. We can observe that

each structure element performs a loop, therefore sec-

ondary structures are composed of loops. If no inter-

action among loops is present, the secondary structure

is pseudoknot free, as illustrated in Figure 1 (A), oth-

erwise it is pseudoknotted, as depicted in Figure 1 (B).

Pseudoknots are tertiary structures that occur

widely in RNA and they play a multitude of roles

in the cell (Staple and Butcher, 2005), including

the catalysis of various ribozymes (Rastogi et al.,

1996), and the alteration of gene expression by induc-

ing ribosomal frameshifting in many viruses (Shen

and Jr, 1995). The biological functions of an RNA

molecule depend on its structure (Laskowski and

Thornton, 2008). The presumption is that to a pre-

served function corresponds a preserved conﬁgura-

tion. In other words, the molecule cannot sustain

substantial changes to its secondary and tertiary struc-

Quadrini, M. and Merelli, E.

Loop-loop Interaction Metrics on RNA Secondary Structures with Pseudoknots.

DOI: 10.5220/0006610700290037

In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 3: BIOINFORMATICS, pages 29-37

ISBN: 978-989-758-280-6

Figure 1: RNA secondary structure.

ture to preserve a particular function. Therefore, the

structure comparison is used in the classiﬁcation of

RNA molecules, the prediction of the folding process

and the measurement of the evolution stability. The

comparison of RNA secondary structures is one of

the main basic computational problems regarding the

study of RNAs. In literature, many approaches have

been proposed for facing this problem. One of them

consists in the ordered trees comparison, but it works

only for RNA pseudoknot free structures, since only

this type of structure can be mapped into an ordered

tree. The method for ordered trees comparison can be

generally classiﬁed into two categories: tree edition

and tree alignment (Herrbach et al., 2010). Both are

based on the edit operations on nodes, i.e., node sub-

stitution, node insertion, and node delation. For each

operation, a cost is associated. Thus, given two trees,

through a sequence of edit operations, one changes

into the other associating a cost which is given by

the sum of the cost of each operation. In contrast to

sequences, the alignment and edition model are not

equivalent for trees. In fact, tree edition consists in

constructing a common supertree, while tree align-

ment permits to ﬁnd the common subtree. Which

model is favourable depends on the biological prob-

lem of interest. It is trivial to observe that the edi-

tion problem is useful to identify the conserved struc-

tures during the folding process, while the alignment

is suitable for clustering RNA molecules purely on

the structure level. The problem of pseudoknotted

motifs comparison has not been solved yet. Only

few algorithms have been developed for studying spe-

ciﬁc cases of pseudoknots: the progress in this ﬁeld

has been hindered by the complexity of the prob-

lem. From an algorithmic perspective, the problem

of comparing RNA structures is usually formalized as

the comparison of arc-annotated sequences featuring

crossing interactions. An arc-annotated sequence is

a sequence over a given alphabet, together with ad-

ditional structural information speciﬁed by arcs con-

necting pairs of positions. The problem of comput-

ing a distance between two arc-annotated sequences

was introduced in (Evans, 1999) with a model that

used only three edit operations either on single nu-

cleotides or base pairs: it has been proved by (Blin

and Touzet, 2006) that such a problem is NP-hard.

Thus, a new representation of RNA secondary struc-

tures and a new approach for their comparison are

necessary.

In this paper, we deﬁne a new measure, loop-loop

interaction metrics, able to compute the distance of

two pseudoknotted secondary structures in terms of

interactions among loops. In particular, we use an

algebraical representation of RNA secondary struc-

tures, both pseudoknot free and pseudoknotted, re-

cently introduced by (Quadrini et al., 2017), that al-

lows us to represent each RNA secondary structure as

an algebraic composition of hairpins. In our model,

the hairpin is the basic loop of such representation.

Firstly, starting from such algebraic expression, we

design an appropriate procedure to obtain the abstract

algebraic expression of the structure, which allows us

to deﬁne a proper set of functions for associating a

word to each RNA secondary structure. The word

permits the identiﬁcation of interactions among loops

and to deﬁne a unique set, intersection set. Such set is

composed of all the pairs of loops that cross together.

Finally, interaction metrics is deﬁned as the symmet-

ric set difference applied to the sets which identiﬁes

the crossing among loops. For illustrating in detail

an application of our approach, it is applied over two

RNA molecules, PKB66 and PKB10, extracted from

Pseudobase++ database. To test the measure, we eval-

uated the evolutionary conservation of the pseudoknot

domain of Vertebrate Telomerase RNA. The most fea-

tured of this structure is the evolutionary conservation

of four structural domains: the pseudoknot domain,

the CR4-CR5 domain, the Box H/ACA domain and

the CR7 domain (Chen et al., 2000).

The paper is organized as follows. In Section

2, we present related works regarding the RNA sec-

ondary structure comparison. The measure of RNA

secondary structures with pseudoknots, that we pro-

pose, is introduced in Section 3, which in turn is or-

ganized into three subsections. In the ﬁrst subsection,

we report an algebraic expression of RNA secondary

structures in terms of hairpins. In the second, start-

ing from the deﬁned algebraic expression, we intro-

duce an appropriate procedure to obtain the abstract

algebraic expression of the structure. Moreover, a set

of functions able to associate a unique word to each

abstract algebraic expression is also deﬁned in this

subsection. In the last subsection, the measure is de-

BIOINFORMATICS 2018 - 9th International Conference on Bioinformatics Models, Methods and Algorithms

scribed and an example of its application is shown.

The results and some critical considerations are dis-

cussed in Section 4. The paper closes with some con-

clusions and future work in Section 5.

2 RELATED WORKS

The structure of a molecule provides a framework

for its biological functions (Laskowski and Thornton,

2008). Thus, the knowledge of structures is very im-

portant and the ability to compare them is useful in the

study of the function and evolution of RNA. In the

literature, there are several approaches to represent

RNA secondary structures which consist of formal-

izing them in terms of base pair sets, trees, graphs or

diagram representations. As a consequence, several

approaches have been proposed for RNA secondary

structure comparisons and corresponding similarity

measurements. The simplest comparison metric is the

base pair distance (Ding et al., 2005), which gives us

the number of different base pairs between two struc-

tures. Other approaches are also possible, such as the

symmetric set difference, the Hausdorff distance, and

the mountain metric (Moulton et al., 2000).

For comparing structures using tree representa-

tion, a classical approach is to ﬁrst deﬁne a set of basic

and atomic operations, called edit operations, that al-

low to change a structure into another. The methods

for ordered tree comparisons can be generally classi-

ﬁed into two categories: tree edition and tree align-

ment (Herrbach et al., 2010). In terms of alignment, a

wide amount of algorithms based on tree comparisons

have been designed (Shapiro, 1988), (Le et al., 1989),

(Corpet and Michot, 1994). In addition, several tree

edit distance metrics have been developed (Shapiro

and Zhang, 1990), (Moulton et al., 2000), (Dulucq

and Tichit, 2003). However, these approaches are not

able to take into account the pseudoknotted RNA sec-

ondary structures. Mohl et al. (M

ohl et al., 2010) de-

veloped a type system for decompositions. The main

idea is that the scheme of a folding algorithm can be

transformed into a dynamic-programming algorithm

for the alignment. Rastegari and Condon in their work

(Rastegari and Condon, 2007) proposed a meta algo-

rithm, which starts by determining the class of each

structure, and then selects a suitable dynamic pro-

gramming algorithm. Song et al. (Song et al., 2015)

introduced a method for aligning two known RNA

secondary structures with pseudoknots based on the

partition function to calculate the scores of the align-

ments between bases or base pairs of the two RNAs

with a dynamic programming algorithm. Moreover,

Evans (Evans, 2011) in her work proposed a polyno-

mial time algorithm for ﬁnding common RNA sub-

structures that include pseudoknots.

3 MATERIAL AND METHODS

The new measure of RNA secondary structures with

pseudoknots, that we propose, permits us to compare

this kind of RNA structures in terms of interaction

among loops. To deﬁne it, we use the algebraic ex-

pression, introduced by (Quadrini et al., 2017). Such

algebraic expression is obtained from an appropri-

ate operator able to model interactions among loops

and the relative translation into a multiple context-

free grammar. These two concepts are reintroduced

in Section 3.1. For more details, the interested read-

ers can refer to (Quadrini et al., 2017). Starting from

this algebraic expression, we obtain its abstract alge-

braic expression through the deﬁnition of an appropri-

ate procedure in Section 3.2. Moreover, in the same

section, we also introduce a set of functions able to

associate a unique word to each abstract algebraic ex-

pression. This word permits us to design another pro-

cedure to identify interactions among loops and to de-

ﬁne a set, where elements are pairs that represent two

crossing loops. In Section 3.3, the new measure is

introduced.

3.1 Algebraic Expression for RNA

Secondary Structures

Each RNA secondary structure is composed of loops,

which can be formalized by the operator 1

. The op-

erator maps two arc diagrams into another one, mod-

eling each interaction among loops. It depends on

a non-negative integer parameter, k, which indicates

that the resulting structure is obtained by attaching

the second arc diagram on the k−th nucleotides of

the ﬁrst one. The operator is well-deﬁned if each

nucleotide of the resulting structure performs at most

one weak interaction. This restriction is due to the na-

ture of RNA molecules. In other words, the situation

illustrated in Figure 2 has to be excluded.

Figure 2: Not permitted structure.

It is also well-deﬁned if the two structures do not

share nucleotides, i.e., the ﬁrst arc diagram is fol-

lowed by the second one. In other words, the two

structures are concatenated, as shown in Figure 3.

Formally, it is obtained when k is equal to 0.

Loop-loop Interaction Metrics on RNA Secondary Structures with Pseudoknots

Figure 3: Concatenation of loops.

Algebraically, ha

i[ α ] represents an RNA sec-

ondary structure. More speciﬁcally, α is the se-

quence of nucleotides (backbone) enclosed by the

pseudoweak interaction, a ﬁctitious weak interaction,

between the ﬁrst nucleotide, a

, and the last one, a

identiﬁed by pair ha

i. See Figure 4 (A) for an il-

lustration. Note that the molecule in Figure 4 (B) is a

special case of an RNA secondary structure, referred

to as a pseudoloop in this paper. It is an RNA sec-

ondary structure without head and tail. Algebraically,

each nucleotide that performs a weak interaction with

another one is represented by symbol ], while the un-

paired nucleotides are indicated by ε.

Figure 4: An example of secondary structure (A) and pseu-

doloop (B).

An example of the crossing operator application

is illustrated in Figure 5. The second arc diagram is

attached to the ﬁfth nucleotide of the ﬁrst arc diagram.

Figure 5: Example of a crossing operator application.

Formally, let S

and S

be two struc-

tures, where S

= (a

)ha

...a

N−1

i and

= (b

)hb

.. .b

M−1

i, the resulting structure,

, is well deﬁned if

k = 0, s ∈ {ε,]}

→ (a

)h a

.. .a

N−1

.. .b

M−1

k ≤N,s ∈{ε, ]},((b

= a

) ∧BC),

((b

= a

k+1

) ∧BC), .. ., ((b

N−k

= a

) ∧BC)

→(a

)h a

.. .b

N−k

N−k+1

.. .b

M−1

where BC expresses the biological constraint, i.e.

each nucleotide performs at most one weak interac-

tion, and it is formalized as follows:

BC : (s = ε, (¯s = ε ∨ ¯s = ])) ∨(s = ], ¯s = ε) .

This operator is translated into a Multiple Context-

Free Grammar (MCFG), introduced in (Seki et al.,

1991). This choice is due to the inadequacy of a

Context-Free Grammar to describe the crossing de-

pendence of pseudoknots; it can be proved by apply-

ing Ogdens Lemma (Harrison, 1978). Thus, a more

expressive grammar is required.

Let Σ

RNA

= {A,U,G,C} be the alpha-

bet of RNA nucleotides, and let Σ

RNA

{(A,U ), (U,A), (G,C),(C, G),(G,U),(U,G)} be

the alphabet of weak interactions, whose elements

represent Watson-Crick or wobble base pairs. The

ﬁrst entry of each pair is the ﬁrst nucleotide of the

hydrogen bond, whereas the second one represents

the corresponding complementary base pair. In

other words, the nucleotides are identiﬁed by left,

) = a

, and right, π

) = a

, which are

canonical projection functions of the ordered pair.

The grammar utilised is G

RNA

= (V

,R, S, F),

where V

= {S,P,L}, V

= Σ

RNA

∪Σ

RNA

∪{[ , ]},

F = {f

(1,k)

} is the set of partial functions, and the set

of productions R is deﬁned as follows:

S ::= αPα RNA secondary structure

P ::= f

(1,0)

JPα,LK Concatenation

| f

(1,k)

JP,LK Nesting or Crossing

| L Hairpin

L ::= x[α

]

where x ∈Σ

RNA

, α ∈ Σ

∗

RNA

and

(1,k)

JS, LK =



S 1

L if 1

is deﬁned;

unde f ined otherwise.

Such multiple context-free grammar G

RNA

gener-

ates uniquely all RNA secondary structures; as a con-

sequence, each secondary structure can be uniquely

decomposed in terms of a particular loop, i.e., hairpin.

The start symbol, S, represents any RNA secondary

structure. The ﬁrst production of the grammar formal-

izes the concatenation between an RNA pseudoloop

P followed by a sequence of nucleotides α, eventu-

ally empty, and a loop L, whereas the second one rep-

resents both the crossing and the nesting between a

pseudoloop P and a loop L. Finally, production P →L

generates a loop. Each loop L is a hairpin, L →x[α

i.e., a Watson-Crick or a wobble base pair encloses a

sequence of unpaired nucleotides, α

. For illustring

an example, we take into account the structure PKB66

obtained from Pseudobase++ database (Taufer et al.,

2008) illustrated in Figure 6.

Figure 6: The diagram of PKB66 molecule extracted from

Pseudobase++ database (Taufer et al., 2008).

BIOINFORMATICS 2018 - 9th International Conference on Bioinformatics Models, Methods and Algorithms

It is a pseudoknot of SELEX-isolated inhibitor of

HIV-1 reverse transcriptase (Burke et al., 1996). The

head and the tail of the structure are α

= CAAGAAC

and α

= ACCA, respectively. The initial pseudoloop

involves nucleotides from the 8-th to the 36-th. The

pseudoloop is composed of crossings among weak in-

teractions. Such crossings will be formalized making

explicit hairpins. The order of choice of hairpins is

well determined and such a choice depends on the

complementary nucleotides of base pairs. In partic-

ular, the hairpin of the pseudoloop having the left-

most complementary nucleotides is selected. Thus,

the ﬁrst selected hairpin is x

[α

] where x

= (G,C)

and α

= GGUGAGAACCGAGACAAACACC. In this

way, the reduced pseudoloop involves nucleotides

from the 8-th to the 35-th. In the following step, the

hairpin x

[α

] has been explicited, where x

= (G,C)

and α

= GUGAGAACCGAGACAAACAC. Moreover,

each time that a hairpin is added it is necessary to for-

malize in which nucleotide of the relative pseudoloop

the hairpin is attached. Thus, the algebraic expression

of the structure is

S = α

[α

] 1

[α

] 1

[α

] 1

[α

] 1

[α

]

[α

] 1

[α

] 1

[α

]α

(1)

where

=CAAGAAC

=GGACGGGUGAGAACC x

=(C, G)

=GACGGGUGAGAAC x

=(G,C)

=ACGGGUGAGAA x

=(G,C)

=AGAACCGAGACAAA x

=(G,C)

=GAGAACCGAGACAAAC x

=(G,C)

=U GAGAACCGAGACAAACA x

=(G,C)

=GUGAGAACCGAGACAAACAC x

=(U, A)

=GGUGAGAACCGAGACAAACACC x

=(G,C)

=ACCA

3.2 From the Algebraic Structure to the

Intersection Set

The grammar, introduced in Section 3.1, permits the

association of a unique algebraic expression for each

RNA secondary structure in terms of hairpins. Such

an algebraic expression contains the structural and bi-

ological information of the molecule. For each al-

gebraic expression, it is possible to associate an ab-

stract expression obtained by the ﬁrst one by remov-

ing the nucleotides and introducing the position of the

weak interaction into the structure. More speciﬁcally,

each weak interaction divides the backbone into three

parts, as illustrated in Figure 7, which are enumerated

from left to right starting from 0.

For each algebraic expression

S = αx[α

] 1

x[α

] 1

··· 1

x[α

]α

Figure 7: Backbone divided by an arc.

by applying the procedure of Abstract Algebraic Ex-

pression, the abstract algebraic expression is ob-

tained. In other words, such procedure takes in in-

put the algebraic expression of an RNA molecule ob-

tained from the multiple context free grammar and re-

turns another algebraic expression,

= L 1

L 1

··· 1

Note that t is a non-negative integer that represents

the part of the backbone which the successive loop is

attached to. Thus, the operator 1

is different from

the initial crossing operator: the initial one depends

on nucleotides, whereas the second one depends on

the part of the backbone. We decided to maintain the

same symbol in order to not overload the notation.

Data: Algebraic Expression of RNA

Secondary Structure

Result: Abstract Algebraic Expression

N is the number of loops;

Let α

be the length of L

;

Let d the length P

−α

;

for i = 2 to N −1 do

Compute P

;

s = 0 ;

while s ≤i do

if k

i−1

= 0 then

t = 2(i −1) ;

else if k

i−1

< P

1+s

then

if k

i−1

> d then

for j = 1 to i −1 do

if k

i−1

≤ k

then

t = j + s ;

end

else

s = s + 1 ;

end

Algorithm 1: Abstract Algebraic Expression.

We take into account RNA molecule PKB66 in-

troduced before and illustrated in Figure 6. Start-

ing from its algebraic expression (1) and applying the

procedure of Abstract Algebraic Expression, the rel-

ative abstract algebraic expression of the considered

Loop-loop Interaction Metrics on RNA Secondary Structures with Pseudoknots

molecule is obtained. It is

= L

(2)

Let S

be the set of abstract algebraic expressions.

Let E : S

→ W

be a rewriting rule that associates

to each abstract expression another expression. Each

loop is indicated by its starting , x

, and ﬁnal, x

points,

and a ∧

is associated to each 1

. Note that the non-

negative integer parameter k is the same for both ex-

pressions.

EJL

SK =



EJL

KEJ1

KEJSK if S = L

∨

S = L

⊥ otherwise.

EJL

K = x

i ∈ N

EJ1

K = ∧

k ∈N

Let F : W

→ w be a rewriting rule that for each ele-

ment of W

associates a word that identiﬁes uniquely

the structure in terms of initial and ﬁnal points of

loops.

F Jω ∧

K =



.. .w

k−1

k+1

.. .w

if length(ω) > k

⊥ otherwise.

F Jω ∧

∧

K =



F Jω

∧

K if ω

= F Jω ∧

⊥ otherwise.

For illustrating an application of the previous rewrit-

ing rules, we again consider the molecule PKB66.

Applying the rewriting rule E to the Abstract Alge-

braic Expression 2, the following term is obtained

= x

∧

(3)

Applying the rewriting rule F to previous term, we

have

w = x

(4)

For each word by applying the following procedure,

the intersection Loop Set, the Intersection set is ob-

tained. Such set is composed of all the pairs of loops

that cross together.

The intersection set of the considered structure, ob-

tained applying the previous algorithm, illustrated in

Figure 6 is

V = {(L

),(L

)}

Data: w, word associated to RNA secondary

strucuture

Result: Intersection set associated to the

structure

N is the number of loops;

for i = 1 to N do

Select x

and x

;

is the subword from x

to x

;

is the length of w

;

V = V ∪{L

} ;

for j = 1 to M

Select w

[ j] = a

;

if a

= x

and x

is an element of w

then

else

V = V ∪{L

} ;

end

Algorithm 2: Intersection Loops Set.

3.3 A Measure for Comparing RNA

Secondary Structure

Each RNA secondary structure can be represented as

an algebraic composition of hairpins, considered as

basic loops. The new measure, that we propose, is

based on the interactions among loops. Let S

and S

be two RNA secondary structures with pseudoknots.

Let V

and V

be the respective intersection sets ob-

tained applying the methodology introduced in Sec-

tion 3.2. Each element of the two sets represents an

interaction between two loops. For example, if the

pair (L

) is an element of V

, it means that L

and

are two loops of structure S

and they cross each

other. Thus, two structures can be compared taking

advantage of the set theory. Many methods have been

proposed in literature. In this case, the symmetric set

difference is a good ﬁrst approach to evaluate the dif-

ference of structures.

Deﬁnition 1: The interaction metrics d

is the car-

dinality of the symmetrics difference between the sets

of interaction among loops V

and V

) = |(V

) ∪(V

where V

and V

are the intersection sets of structure

and S

, respectively. Note that A \B is the set

of all elements that are in A, but not in B. Hence,

we count the crossings present in either of the struc-

tures, but not in both. This interaction loop distance

is a metric. This metric is very strict: all differences

have the same weight. It does not take into account

the backbone of the two structures. For illustating an

BIOINFORMATICS 2018 - 9th International Conference on Bioinformatics Models, Methods and Algorithms

Figure 8: The diagram of PKB66 molecule extracted from

(Taufer et al., 2008).

application of the proposed measure, we consider as

examples two structures, S

and S

. Let S

be PKB66

molecule illustrated in Figure 6 and let S

be PKB10

molecule illustrated in Figure 8. It is tRNA-like struc-

ture 3’end pseudoknot of ononis yellow mosaic virus.

Its intersection loops set is

= {(L

),(L

)}

The distance in terms of interaction among loops of

the two considered structure is d

(V,V

) = 3. In fact,

the cardinality of the difference between V and V

0 since each element of V is also in V

, vice versa

the cardinality of V

\V is 3 because of three pairs,

),(L

), are elements of V

, but they

are not in V .

4 RESULTS AND DISCUSSIONS

In this paper, we introduced a measure able to com-

pare RNA secondary structures in terms of interac-

tions among loops. In order to test the measure,

we evaluated the evolutionary conservation of the

pseudoknot domain of Vertebrate Telomerase RNA.

Telomerase is a ribonucleoprotein enzyme that main-

tains telomere length by adding telomeric sequence

repeats onto chromosome ends. The essential RNA

component of telomerase provides the template for

secondary structure of telomeric repeat synthesis. The

most featured Vertebrate Telomerase RNA is the evo-

lutionary conservation of four structural domains: the

pseudoknot domain, the CR4-CR5 domain, the Box

H/ACA domain and the CR7 domain (Chen et al.,

2000). Applying the proposed methodology to the

two pseudoknots, the distance is

) = 0

where V

and V

are the intersection sets of the

two pseudoknots domains of human and sharpnose

shark telomerase RNAs, respectively. The result,

) = 0, shows that each interaction between

loops belongs to both molecules. As consequences,

the structure is conserved in accordance to the results

present in the literature (Chen et al., 2000). Moreover,

this measure, able to capture the interactions between

loops, can be also applied to classify the molecules.

Although two molecules of each pair are character-

ized by a functional similarity, the interaction among

loops can differ. For example, we take into account a

pair of molecules, extracted from (Taufer et al., 2008),

that involves structural elements for translation initia-

tion and ribosome recruitment found in the viral inter-

nal ribosome entry site (PKB223) and the V4 domain

of 18S rRNA (PKB205) (Pasquali et al., 2005). Ap-

plying the proposed methodology to the two pseudo-

knots, the distance d

PKB223

PKB205

) = 24, where

PKB223

and V

PKB205

are the intersection sets of the

PKB223 and PKB205 molecules, respectively. This

information can be considered as a structural con-

strain to guide the secondary structure folding. In

fact, the biological presumption is that the RNA struc-

ture folds hierarchically. During the folding pro-

cess, pseudoknot free structures are initially formed,

whereas pseudoknots motifs are generated later to

minimize the energy. Thus, a classiﬁcation of the

structures is useful to understand or classify how the

structure evolves. Moreover, the measure can be also

used to detect a mutation. At a structural level, the

measure is able to capture the interactions between

the loops. Each interaction between two loops is de-

termined by a crossing of two base pairs. Taking the

crossing among base pairs in consideration permits to

deﬁne a more precise energy function than the stan-

dard one (Vernizzi et al., 2016).

The introduced measure is obtained taking advan-

tage of the set theory. In particular, the symmetric dif-

ference of sets has been used. Other similarity func-

tions can be applied to reach a more accurate measure

based on interactions among loops. A list of useble

similitarity functions is reported in Table 1.

Table 1: Similarity functions over two set X and Y .

Similarity Functions Deﬁnition

Intersection S

(X,Y )= |X ∩Y|

Cosine S

(X,Y )=

|X∩Y |

√

|X||Y |

Dice S

(X,Y )=

2|X∩Y |

|X|+|Y |

Hamming S

(X,Y )= |(X ∩Y )∪(X ∪Y )

Jaccard S

(X,Y )=

|X∩Y |

|(X∪Y )

From an algorithmic point of view, for each RNA

molecule, in order to deﬁne the measure, we obtained

a word that uniquely represents the secondary struc-

ture. Over this word, it is possible to deﬁne a set of

rewriting rules that permits us to obtain the shape of

Loop-loop Interaction Metrics on RNA Secondary Structures with Pseudoknots

each molecule. The shape is a topological concept

widely used by Bon (Bon et al., 2008) and Reydis et

al. (Reidys et al., 2011). Moreover, it is also pos-

sible to deﬁne an algorithm to compute some topo-

logical invariants, such as genus and crossing num-

ber (Vernizzi et al., 2016). Another possible proce-

dure over the word can be easily deﬁned to detect

whether or not a pseudoknot belongs to a given class.

Understanding if two structures are characterized by

the same pseudoknots is useful for the choice of the

particular algorithm for comparing the two structures

taking into account the biological relevant operations

such as addition, deletion, and substitution of nu-

cleotides or base pairs.

5 CONCLUSIONS

The biological function of an RNA molecule depends

on its structure. As a consequence, the molecule can-

not sustain substantial changes to its secondary and

tertiary structures to preserve the particular function.

Thus, the knowledge of the structure is very important

and the ability to compare the RNA structure motifs

supports the study of function and evolution of RNA.

In this paper, we proposed a measure to compare

RNA secondary structures with pseudoknots in terms

of interactions among loops. From a biological point

of view, it is useful to identify the conserved struc-

tures during the evolution since its primary structure

is often unpreserved. In fact, this measure is able to

detect the global properties of the molecules taking

advantage of the set theory. Consequently, a bene-

ﬁt is that it can be computed quickly. Its properties

make the measure easy to be handled theoretically. A

statistical study over a large set of molecules can be

performed in order to determine a new clusterization.

This clusterization can be compared with others taken

from differnt approaches present in the literature.

We plan to improve the developed software that

implements the measure and the whole methodology

presented in this paper in order to investigate and an-

alyze in statistical terms the correlations between the

proposed measure and the functions of RNAs. More-

over, we plan to evaluate the ﬁve similarity func-

tions in order to classify the performance of the dif-

ferent similarity functions as measured. For reach-

ing the goals, we have decided to compare molecules

extracted from the Rfam (Nawrocki et al., 2015)

database. This database classiﬁes non-coding RNAs

in families whose member posses a similar secondary

structure, suggesting evolutionary relationships and

similar functions. Moreover, this database provides

a consensus secondary structure for each family.

ACKNOWLEDGEMENTS

We acknowledge the ﬁnancial support of the Fu-

ture and Emerging Technologies (FET) programme

within the Seventh Framework Programme (FP7)

for Research of the European Commission, un-

der the FET-Proactive grant agreement TOPDRIM

(www.topdrim.eu), number FP7-ICT- 318121.

REFERENCES

Blin, G. and Touzet, H. (2006). How to compare arc-

annotated sequences: The alignment hierarchy. In In-

ternational Symposium on String Processing and In-

formation Retrieval, pages 291–303. Springer.

Bon, M., Vernizzi, G., Orland, H., and Zee, A. (2008).

Topological classiﬁcation of RNA structures. Journal

of molecular biology, 379(4):900–911.

Burke, D. H., Scates, L., Andrews, K., and Gold, L. (1996).

Bent pseudoknots and novel rna inhibitors of type 1

human immunodeﬁciency virus (hiv-1) reverse tran-

scriptase. Journal of molecular biology, 264(4):650–

666.

Chen, J.-L., Blasco, M. A., and Greider, C. W. (2000). Sec-

ondary structure of vertebrate telomerase rna. Cell,

100(5):503 – 514.

Corpet, F. and Michot, B. (1994). Rnalign program: align-

ment of rna sequences using both primary and sec-

ondary structures. Computer applications in the bio-

sciences: CABIOS, 10(4):389–399.

Dill, K. (1990). Dominant forces in protein folding. Bio-

chemistry, 29(31):7133–55.

Ding, Y., Chan, C. Y., and Lawrence, C. E. (2005). Rna

secondary structure prediction by centroids in a boltz-

mann weighted ensemble. Rna, 11(8):1157–1166.

Dulucq, S. and Tichit, L. (2003). Rna secondary structure

comparison: exact analysis of the zhang–shasha tree

edit algorithm. Theoretical Computer Science, 306(1-

3):471–484.

Evans, P. (1999). Algorithms and Complexity for Anno-

tated Sequences Analysis. PhD thesis, University of

Victoria.

Evans, P. A. (2011). Finding common rna pseudoknot struc-

tures in polynomial time. Journal of Discrete Algo-

rithms, 9(4):335 – 343.

Ferr

e-D’Amar

e, A. and Doudna, J. (1999). Rna folds: in-

sights from recent crystal structures. Annual review of

biophysics and biomolecular structure, 28(1):57–73.

Harrison, M. A. (1978). Introduction to formal language

theory. Addison-Wesley Longman Publishing Co.,

Inc.

Herrbach, C., Denise, A., and Dulucq, S. (2010). Av-

erage complexity of the jiang–wang–zhang pairwise

tree alignment algorithm and of a rna secondary struc-

ture alignment algorithm. Theoretical Computer Sci-

ence, 411(26):2423–2432.

BIOINFORMATICS 2018 - 9th International Conference on Bioinformatics Models, Methods and Algorithms

Laskowski, R. and Thornton, J. (2008). Understanding the

molecular machinery of genetics through 3D struc-

tures. Nature Reviews Genetics, 9(2):41–151.

Le, S.-Y., Owens, J., Nussinov, R., Chen, J.-H., Shapiro,

B., and Maizel, J. V. (1989). Rna secondary struc-

tures: comparison and determination of frequently re-

curring substructures by consensus. Bioinformatics,

5(3):205–210.

ohl, M., Will, S., and Backofen, R. (2010). Lifting pre-

diction to alignment of rna pseudoknots. Journal of

Computational Biology, 17(3):429–442.

Moulton, V., Zuker, M., Steel, M., Pointon, R., and Penny,

D. (2000). Metrics on rna secondary structures. Jour-

nal of Computational Biology, 7(1-2):277–292.

Nawrocki, E. P., Burge, S. W., Bateman, A., Daub, J., Eber-

hardt, R. Y., Eddy, S. R., Floden, E. W., Gardner, P. P.,

Jones, T. A., Tate, J., and Finn, R. D. (2015). Rfam

12.0: updates to the rna families database. Nucleic

Acids Research, 43(D1):D130–D137.

Pasquali, S., Gan, H. H., and Schlick, T. (2005). Modu-

lar rna architecture revealed by computational analysis

of existing pseudoknots and ribosomal rnas. Nucleic

Acids Research, 33(4):1384–1398.

Quadrini, M., Culmone, R., and Merelli, E. (2017). Topo-

logical classiﬁcation of rna structures via intersec-

tion graph. Accepted to 6th International Confer-

ence on the Theory and Practice of Natural Computing

(TPNC).

Rastegari, B. and Condon, A. (2007). Parsing nucleic

acid pseudoknotted secondary structure: algorithm

and applications. Journal of computational biology,

14(1):16–32.

Rastogi, T., Beattie, T. L., Olive, J. E., and Collins, R. A.

(1996). A long-range pseudoknot is required for ac-

tivity of the neurospora vs ribozyme. The EMBO jour-

nal, 15(11):2820.

Reidys, C. M., Huang, F., Andersen, J. E., Penner, R. C.,

Stadler, P. F., and Nebel, M. E. (2011). Topology

and prediction of rna pseudoknots. Bioinformatics,

27(8):1076–1085.

Seki, H., Matsumura, T., Fujii, M., and Kasami, T. (1991).

On multiple context-free grammars. Theoretical Com-

puter Science, 88(2):191–229.

Shapiro, B. A. (1988). An algorithm for comparing multiple

rna secondary structures. Computer applications in

the biosciences: CABIOS, 4(3):387–393.

Shapiro, B. A. and Zhang, K. (1990). Comparing multi-

ple rna secondary structures using tree comparisons.

Bioinformatics, 6(4):309–318.

Shen, L. X. and Jr, I. T. (1995). The structure of an rna pseu-

doknot that causes efﬁcient frameshifting in mouse

mammary tumor virus. Journal of molecular biology,

247(5):963–978.

Song, Y., Hua, L., Shapiro, B. A., and Wang, J. T. (2015).

Effective alignment of rna pseudoknot structures us-

ing partition function posterior log-odds scores. BMC

Bioinformatics, 16(1).

Staple, D. W. and Butcher, S. E. (2005). Pseudoknots: Rna

structures with diverse functions. PLOS Biology, 3(6).

Taufer, M., Licon, A., Araiza, R., Mireles, D., Van Baten-

burg, F., Gultyaev, A. P., and Leung, M.-Y. (2008).

Pseudobase++: an extension of pseudobase for easy

searching, formatting and visualization of pseudo-

knots. Nucleic acids research, 37(suppl 1):D127–

D135.

Vernizzi, G., Orland, H., and Zee, A. (2016). Classiﬁcation

and predictions of rna pseudoknots based on topolog-

ical invariants. Physical Review E, 94(4):042410.

Loop-loop Interaction Metrics on RNA Secondary Structures with Pseudoknots