Formal Analysis of Rewriting System Representing RNA Folding

Krishnendu Ghosh

1 a

and Julia Goldman

2 b

Department of Computer Science, College of Charleston, SC, U.S.A.

Department of Mathematics, Texas Christian University, TX, U.S.A.

Keywords:

RNA Folding, Probabilistic Model Checking, Rewriting System, Stochastic Modeling.

Abstract:

Prediction of RNA structure is an important problem in understanding biological processes in living organism.

Computational models have been created to study the processes with the aim of unravelling the RNA structure.

In this work, a novel formalism for formal analysis of RNA structure prediction is described. A graph rewrit-

ing system is formalized to represent structural dynamics of RNA structure under uncertainty. Probabilistic

model checking is performed on queries seeking structural properties in RNA. Experiments were conducted

to evaluate the computational feasibility of the model.

1 INTRODUCTION

The signiﬁcance of the role of RNA in biological pro-

cesses such as gene expression and inhibition is im-

mense (Riddihough, 2016). The structural dynam-

ics of RNA provides insights in the biological pro-

cesses. RNA secondary structure prediction is critical

in understanding the function of RNA. The primary

structure of RNA is represented by a sequence of the

nucleotides- A, U, G, C. The RNA secondary struc-

ture is formed with the folding of an RNA strand with

formation of hydrogen bonds. RNA pseudoknots are

formed from the Watson-Crick base pairing. It is ac-

cepted that the secondary RNA structure is predicted

based on the minimum free energy for stability.

The problem of predicting RNA secondary structure

containing pseudoknots is NP-complete for a large

number class of pseudoknots (Lyngsø and Peder-

sen, 2000). The design of secondary structure us-

ing the Watson-Crick is NP-complete in a more re-

alistic model of RNA sequence (Bonnet et al., 2020).

The prediction of RNA secondary structure is com-

putational intensive and hence, construction of novel

methods are necessitated. Machine learning algo-

rithms have been studied for RNA secondary structure

prediction (Zhao et al., 2021). Given the black-box

nature of deep learning (Sato et al., 2021), it is not that

useful for biologist to understand the complete pro-

cess of the structural dynamics of RNA. Probabilis-

https://orcid.org/0000-0002-8471-6537

https://orcid.org/0000-0003-1963-5914

tic models have been useful in modeling RNA sec-

ondary structure when different sources of data, such

as homologous RNA sequences, thermodynamic pa-

rameters of the energy minimization model, are com-

bined to predict structure (Dowell and Eddy, 2004).

The goal of this work is to evaluate how the struc-

tural change a RNA will go through during change

of free energy. It is not possible to get precise value

of free energy through experiments for studying the

structural dynamics of RNA. Computational methods

have been sought to study changes in the RNA struc-

ture. Our work leverages on construction of a for-

malism that is based on rewriting system under un-

certainty. The computational challenge is to relate the

structural changes with the minimum free energy. We

consider the minimum free energy as the reason for

the RNA structural changes. The rewriting rules rep-

resents the change in the structure from one structure

to another. The contribution of this work is to model

RNA structural dynamics with a ﬁnite state machine

under uncertainty and then, apply temporal logic as a

querying mechanism to evaluate RNA structural dy-

namics.

Model checking is a technique that veriﬁes dynamic

properties on a ﬁnite state machine representing the

system. Correctness of software, network protocol

and hardware have been veriﬁed using model check-

ing. Model checking represents a system symbol-

ically and not explicitly. The time complexity of

model checking is polynomial to the size of the

model. Properties or speciﬁcations are stated in the

form of temporal logic formulas which are precise

Ghosh, K. and Goldman, J.

Formal Analysis of Rewriting System Representing RNA Folding.

DOI: 10.5220/0011734300003414

In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 3: BIOINFORMATICS, pages 235-242

ISBN: 978-989-758-631-6; ISSN: 2184-4305

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

235

properties posed as a query to the ﬁnite state machine

representation of the system. Probabilistic model

checking is performed which is the properties repre-

sented by different computational logics are posed as

query to the stochastic structures.

We create a formalism to study RNA structural dy-

namics by representing RNA structures by graph

rewriting, and uncertainty in the dynamics is incorpo-

rated by using stochastic models. Stochastic models

represent uncertainty in the model when RNA strands

transitions from one structure to another with dif-

ferent, random rates. Computational feasibility and

properties of the model are evaluated by experimen-

tation using software, PRISM (Kwiatkowska, 2003).

To the best of our knowledge, this is the ﬁrst work that

demonstrates application of model checking to RNA

structure prediction.

2 RELATED WORK

In this section, we describe the related work on mod-

eling of RNA secondary structure prediction based on

discrete structures and inferences using logic-based

approaches.

Formal language approaches have been investigated

in the modeling of RNA structure. An algebraic lan-

guage for tree representation of RNA secondary struc-

ture was described (Quadrini et al., 2019). The op-

erations deﬁned on the language were concatenation,

nesting, and crossing. Concatenation was used for

motifs where one structure follows another. Nesting

was used to show when a structure had been inserted

into the hairpin, and crossing was to show the interac-

tion between structures. The three operations are used

to create a unique tree representation for each RNA

structure. To ensure that all RNA secondary structures

could be expressed, the operators were used to repre-

sent pseudoknots as a unique combination of hairpins,

the most basic loop structure. A novel method to com-

pare RNA secondary structures using speciﬁc rep-

resentations of secondary structures based algebraic

tree has been reported (Quadrini et al., 2020). RNA

pseudoknots have been modeled using term rewriting

(Fu et al., 2008).

Formal grammars have been proposed (Jonoska et al.,

2021) for modeling RNA:DNA interactions and the

formation of R-loops (3-stranded nucleic acid hybrid

structure). RNA folding was modeled as graph trans-

formation in the presence of free energy (Mamuye

et al., 2016). In this model, each RNA conﬁgura-

tion was represented as a graph and the evolution of

conﬁgurations was rule based, represented by graph

grammar.

SAT solvers were also applied in RNA secondary

structure prediction (Ganesh et al., 2012). The user-

provided code included structural constraints (biolog-

ical properties of the RNA structure) and energy con-

straints (quantitative requirements). Speciﬁcally, the

work address correct attribution of a structural state to

each nucleic acid within an RNA sequence. Danos et

al (Danos et al., 2012) construct pathways using a new

graph-based semantics system and a rule-based lan-

guage for protein-protein interactions called Kappa.

Single pushout (SPO) is the technique used for this

model. This means that there will be a left-hand

side, a right-hand side, and a domain of deﬁnition.

RNA can be described using an alphabet of the nu-

cleotides, (A,U,G,C) and its secondary structure can

be described by the ways in which the nucleotides

bond with each other. Often, the optimal secondary

structure is predicted to be the one with minimum free

energy (MFE). In the Watson-Crick model this would

be the structure with the most base pairs. The predic-

tion of the RNA structure with MFE is evaluated for

models that do not contain pseudoknots. This is called

the RNA folding problem. Inclusion of pseudoknots

in the problem essentially causes it to be NP-complete

(Bonnet et al., 2020). The RNA design problem in-

volves ﬁnding a sequence of nucleotide that folds into

a given secondary structure. RNA Design Extension

is the same, except for the added condition that some

indices of the sequence must contain a speciﬁed base.

A sample of Boltzman distribution to generate subop-

timal RNA structures has been reported (Rogers et al.,

2017). Algorithmic construction of RNA secondary

structures was investigated and the result- designing

RNA secondary structures in the Watson-Crick model

was proved to be NP hard if the input structure was la-

beled with bases at some designated position (Bonnet

et al., 2020).

There is a body of literature of model checking in

systems biology, in particular using stochastic mod-

els which has has been an active research area for

a decade (Kwiatkowska and Thachuk, 2014). For-

mal modeling such as model checking has been used

as a querying mechanisms on models of biochemical

pathways (Heath et al., 2008; Chabrier-Rivier et al.,

2004).

3 PRELIMINARIES

In this section, we give the deﬁnitions on which the

formalism for RNA structure prediction is based. The

formalism integrates concepts from multiple topics

such as stochastic structures- discrete-time Markov

chain, continuous-time Markov chain, probabilistic

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

236

model checking and graph rewriting.

The state based deﬁnition of the stochastic structures

such as discrete time Markov chain (Baier et al.,

2008) is:

Deﬁnition 1. (Discrete-Time Markov Chain

(DTMC)) a discrete-time Markov chain is a tu-

ple: M

hS,S

,ι

init

,P,Li where:

1. S is a ﬁnite set of states.

2. S

is the set of initial states.

3. P : S × S → [0,1] , where P represents the proba-

bility matrix and

∑

s,s

∈S

P (s,s

) = 1.

4. ι

init

: S → [0, 1] where

∑

s∈S

init

(s) = 1 is the initial

distribution.

5. L : S → 2

, where L is a labeling function and AP

the set of atomic propositions.

Deﬁnition 2. (Labeled Continuous-Time Markov

Chain (LCTMC)) A labeled Continuous-time Markov

Chain (Baier et al., 2008) is a tuple, K =

hS,S

,R,AP, Li where:

1. S is a set of states.

2. S

⊂ S is the set of initial states.

3. R : S × S → R

≥0

as the rate matrix.

4. L : S ← 2

is a labeling function.

The labeled CTMC described in Deﬁnition 2 elim-

inates the requirement R(s,s) =

∑

s6=s

R(s,s

), unlike

non-state based deﬁnition of CTMCs. Self-loops are

modeled by R(s,s

) > 0.

Deﬁnition 3. (Probabilistic Model checking) Given a

probabilistic model,M

and formula,φ ,model check-

ing is the process of computing the answer to the

question of whether M

|= φ holds.

PCTL syntax includes state formulas φ and path

formulas ψ. Within the formulas, the next, bounded

until, and until operators are allowed (Parker, 2003).

3.1 Probabilistic Computation Tree

Logic

We describe the syntax and semantics of probabilis-

tic computation tree logic (PCTL) ((Aziz et al., 1995;

Hansson and Jonsson, 1994)).The syntax of PCTL is:

φ ::= true | p | φ ∧ φ | ¬φ | P

⊕J

[ψ]

ψ ::= X φ | φU

≤k

φ | φUφ

where p is an atomic proposition,⊕ ∈ {≤, <,≥, >

},J ∈ [0,1] and k ∈ N. φ,ψ are state and path for-

mula respectively. φ and ψ are state and path for-

mulas respectively. Each of these formulas are inter-

preted over a DTMC or an MDP. Each state of DTMC

or MDP is labeled from the set of atomic proposi-

tion. Speciﬁcation is represented in the form of a state

formula. Path formula ψ are preceded by the prob-

ability path operator P . Examples of intervals that

are bounds for P are : P

≤0.5

(ψ) denotes P

[0,0.5]

(ψ).

DTMC satisﬁes P

⊕J

is the probability of a path from

s satisfying ψ is in the bound stated by ⊕p. The path

forumla,Xφ is true if φ is satisﬁed in the next state.

The formula φ

≤k

is true if φ

is satisﬁed within

k time-steps and φ

is true at that point. Similar is the

description of φ

Uφ

where φ

is true some point in

future till then φ

is true.

The semantics of PCTL over DTMC is given by:

Given a DTMC, M

= hS

,S, P ,Li and a PCTL for-

mula, the notation s |= φ represents φ is satisﬁed in s.

For a given path, π satisfyinng a PCTL path formula,

the notation is π |= ψ. The semantics of PCTL over

(Parker, 2003):

For a path π :

1. π |= Xφ iff π(1) |= φ.

2. π |= φ

≤k

iff ∃i ≤ k.(π(i) |= φ

∧ π( j) |=

,∀ j < i.

3. π |= φ

Uφ

iff ∃k ≥ 0, π |= φ

≤k

For a state, s ∈ S:

1. s |= true,∀s ∈ S.

2. s |= a iff a ∈ L(s).

3. s |= φ

∧ φ

iff s |= φ

∧ s |= φ

4. s |= ¬φ iff s 6|= φ.

5. s |= P

⊕J

[ψ] iff p

(ψ) ⊕ p.

where p

(ψ) = Pr

({π ∈ Path(s) | π ||= ψ}) where Pr

is the set of paths consistes of non-empty sequence of

states in the DTMC.

CTMCs can be described by two properties: tran-

sient behavior and steady-state behavior. Transient

behavior describes the system at a particular moment

in time, whereas steady-state behavior describes the

system in the long-run.

The temporal logic used to specify properties of

CTMCs is called continuous stochastic logic (CSL).

In addition to the operators used in PCTL, CSL also

uses the time-bounded until operator and the steady-

state operator S (Parker, 2003).

3.2 Continuous Stochastic Logic

Model checking on CTMC is performed by continu-

ous stochastic logic (CSL) (Aziz et al., 1996; Baier

et al., 1999). The syntax of CSL (Aziz et al., 1996)

The syntax of CSL is

φ ::= true | a | φ ∧ φ | ¬φ | P

⊕p

[ψ] | S

⊕p

[φ]

ψ ::= X φ | φU

<=k

φ |φUφ

Formal Analysis of Rewriting System Representing RNA Folding

237

where a is an atomic proposition,⊕ ∈ {≤,<,≥, >

}, p ∈ [0,1] and k ∈ R

≥0

. φ,ψ are state and path for-

mula respectively. φ and ψ are state and path formu-

las respectively. P

⊕

pψ]represents the probability of

φ satisﬁed from a given state satisﬁes the bound ⊕p.

The bounded until operator φ

≤

kφ

is valid if φ

for

a time instant in the interval [0, k] and φ

is valid at

all preceding time instants. The other until operator,

U is not dependent. DTMC or MDP satisﬁes P

⊕p

the probability of a path from s satisfying ψ is in the

bound stated by ⊕p .The path formula,Xφ is true if

φ is satisﬁed in the next state. The formula φ

≤k

is true if φ

is satisﬁed within k time-steps and φ

true at that point. Similar is the description of φ

Uφ

where φ

is true some point in future till then φ

true.

3.3 RNA Probabilistic Rewriting

System

We deﬁne the language on RNA graphs and graph

rewriting: (G Taentzer and K Ehrig, 2006). The dy-

namics of RNA structure is modeled using rewrit-

ing system. In our formalization, we construct RNA

structural graph and then, create a model that lever-

ages on the graph rules under uncertainty. Our model

integrates representation of RNA graph and uncer-

tainty in the folding of RNA.

Deﬁnition 4. An RNA-graph is a graph

(V,E,L,L

) where vertices represent bases ,

and edges represent the bonds between bases such

that

1. V is the set of vertices.

2. E is the set of edges and e ∈ E. e = hv, v

i and

v,v

∈ V .

3. (No self loop) 6 ∃hv,ui such that v = u.

4. (Labeling function) L : V → Ba where Ba is set of

bases and Ba={A,G,U,C}.

5. (Edge Labeling function) L

: V → Bo where Bo is

the set of bonds.

In the construction of the graph rewriting system,

the set of rules are triggered by a probability (Krause

and Giese, 2012). The probabilistic folding model

is adapted from probabilistic timed graph (Maximova

et al., 2018) where Dist(Z) is the set of probability

distribution on the set of rules, Z.

Deﬁnition 5. (Probabilistic folding rule) A proba-

bilistic folding rule, κ = hG,Z,µi where

1. G is the RNA graph.

2. Z is the set of non-empty ﬁnite rules such that G =

L for all z = hL

← K

→ Ri ∈ Z, and µ ∈ Dist(Z).

Here, there are multiple right-hand sides,R for

a single G. For the RNA structure model, Z =

{hairpin, bulge, helix, internal loop}.

Rewrite rules generate graph transformations.

Our model focuses on representing the secondary

structure of RNA molecules. Secondary structure

refers to the ordered sequence of bases and the bonds

that connect them.

3.4 Model

The model based on discrete-time Markov chain,M is

the following- A RNA graph, call it a starting graph,

is transformed

in next state, when one of the

rules, z ∈ Z triggers. The reading of a transition from

a state, s to state s

is a RNA graph G

under a rule,z

is transformed into

with probability, p

such that

the sum of p

is 1. Here, s,s

∈ S where S is the set

of states in M . For the CTMC variant, there is no re-

quirement of sum of p

s should be one. The rates are

the labels on the transition. The reading of a transition

from a state, s to state s

is a RNA graph G

under a

rule, z is transformed into

with rate, q.

The ﬁnite state machine,K of the RNA structural dy-

namics is represented as: Given a ﬁnite set of RNA

structure,S

rna

= {st

,st

,. .. st

} and set of ﬁnite min-

imum free energy, FE = { f e

, f e

,. .. , f e

} where

f e

∈ R and n,m, i ∈ N. The states of K are la-

beled with a structure, st, st

∈ S

rna

and f e ∈ FE. A

transition, s → s

where s,s

are states in K implies

structure,st is transformed to st

in the presence of f e.

Here, the label of s is st and f e. The label of s

is st

4 SIMULATION

4.1 Data Preparation

The data for the simulation was from RNAeval web-

server (RNA, ) in a model of RNA structure(Mamuye

et al., 2016), The program, part of the ViennaRNA

Web Services(Gruber et al., 2008), allows the user to

input any RNA sequence, and then server calculates

the energy on a given secondary structure. The

thermodynamic description given by RNAeval to

calculate the free energy of each structure (Mamuye

et al., 2016). The free energy values make it possible

to choose the optimal structure for each step in the

graph transformation. The comparison of predicted

optimal structure to the one predicted by the RNAfold

web server were validated (Mamuye et al., 2016).

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

238

Table 1: Path from state 0 to state 6.

State Energy(kcal/mol) w(i) w(p) E(p) Rate from current to next state

0 0.00 1 1 0 1.80E-4

1 4.80 0.000414653 1.000414653 -0.000110966 3.91E-3

2 2.90 0.009047271 1.009461924 -0.002520743 1.25E-3

3 3.60 0.002905783 1.012367707 -0.00329013 1.89E-6

4 7.60 4.41232E-6 1.012372119 -0.003291296 8.15E-6

5 6.70 1.90043E-5 1.012391124 -0.003296321 3.62E-6

6 7.20 8.44358E-6 1.012399567 -0.003298553 terminal state

Table 2: Path from state 0 to state 10.

State Energy (kcal/mol) w(i) w(p) E(p) Rate from current to next state

0 0.00 1 1 0 6.35E-3

7 2.60 0.014720154 1.014720154 -0.003911389 7.64E-4

8 3.90 0.001785947 1.0165061 -0.004382081 1.19E-1

9 0.70 0.321177882 1.337683982 -0.77875129 1.85

10 -2.80 93.9761137 95.31379768 -1.219807747 terminal state

4.2 Computational Feasibility of the

Model

The sample strand, CUUACCAUCGGGUUAGAG-

GAG, used for both the DTMC and CTMC model

is taken from literature (Mamuye et al., 2016). The

energy values of each structure were calculated using

the RNAeval server (Gruber et al., 2008). Figure 1

outlines two possible paths in a ﬁnite state machines

for the RNA folding. The structural dynamics of RNA

strand begins in the unfolded state, s

. States s

and

have self loop which implies that there is no fur-

ther folding.

Figure 1: Representation of RNA folding in a ﬁnite state

machine.

Each arrow represents the formation of a loop

and the bonding of bases, and each subsequent state

represents the resulting change in the molecule’s

secondary structure. The two paths in the ﬁnite state

machines are as follows:

Path from s

to s

1. s

→ s

: Bonds form between the 1

and 21

bases and the 2

and 20

bases to form a helix.

2. s

→ s

: A bond forms between the 3

and 19

bases to form a helix.

3. s

→ s

: A bond forms between the 6

and 16

bases to form an internal loop.

4. s

→ s

: A bond forms between the 7

and 13

bases to form a bulge.

5. s

→ s

: A bond forms between the 8

and 12

bases to form a hairpin.

6. s

→ s

: A bond forms between the 5

and 18

bases to form a bulge.

Path from s

to s

1. s

→ s

: Bonds form between the 1

and 16

bases and the 2

and 15

bases to form a helix.

2. s

→ s

: A bond forms between the 4

and 13

bases to form an internal loop.

3. s

→ s

: A bond forms between the 5

and 12

bases to form a helix.

4. s

→ s

: A bond forms between the 6

and 11

bases to form a helix.

Each path is constructed by minimum free energy

whose values differ and hence, there are two paths

starting from s

. In the simulation, the structures

of RNA are represented symbolically. Note that the

sample strand and paths in the ﬁnite state machine

representing RNA structural is a simple example and

the simulations results can be validated by compar-

ing to published values. The sample queries in the

form of logic speciﬁcations are posed on the stochas-

tic structures representing RNA structural dynamics.

The structures are denoted by the states in the queries.

The experiments were conducted on system with Intel

Core i7 with CPU 2.11 GHz and 16GB RAM. Table 1

and 2 for calculated values of rates and energy where

Formal Analysis of Rewriting System Representing RNA Folding

239

PCTL Formula Results Time (seconds)

P = ? [F x]

”What is the probability of reaching x?

0.027397281 when x = s

0.972602719 when x = s

0.001 when x = s

0.01 when x = s

P >0.5 [F s=x]

”Verify that the probability of reaching x is greater than 0.5.”

false when x = s

true when x = s

0.002 when x = s

0.009 when x = s

P = ? [s

U s

]

What is the probability that the s

is reached before s

0.027397281 0.006

P = ? [s

U s

]

”What is the probability s

is reached before state s

?”

0.972602719 0.008

P = ? [s

U s

]

”What is the probability that s

before s

?”

0.00 0.005

Figure 2: Execution times and results for PCTL queries on the DTMC model.

CSL Formula Results Time (sec)

P = ? [F x

]

meaning: ”What is the probability that the molecule will reach x

0.027397281 0.005

P = ? [F x

]

meaning: ”What is the probability that the molecule will reach x

?”

0.972602719 0.002

P = ? [true U[4,4] x

]

meaning: ”What is the probability x

exists at time instant 4?”

2.772E-25 0.003

P = ? [true U[4,4] x

]

meaning: ”What is the probability of the x

at time instant 4?”

2.948E-6 0.002

Figure 3: Execution times and results for CSL queries on the CTMC model.

E(p), w(i),w(p) denote the energy of the path, weight

of the ith state and weight of the path, respectively.

4.2.1 DTMC Model

In the DTMC model, each structure Each transition is

assigned a probability, deﬁned by the equation given

in (Kirkpatrick et al., 2013).

Deﬁnition 6. The equilibrium probability for each

state is deﬁned by

−E(i)/RT

∑

j∈S

−E( j)/RT

where:

1. S is the set of states.

2. i, j ∈ S

3. E(i) is the energy of state i.

4. R is the gas constant. In this case, R is the prod-

uct of Avogadro’s number and the Boltzmann con-

stant.

5. T is the temperature. For this model, T is approx-

imately the body temperature, 310.15 K.

The DTMC model is used to compute the prob-

ability of the molecule terminating at either s

. The probability of the molecule reaching s

0.027397281. The probability of the molecule reach-

ing s

, the minimum free energy (MFE) structure,

is 0.972602719. The model checker can also indi-

cate whether a structure is likely to occur by verify-

ing whether the probability is greater than one half.

This is true only when the ﬁnal structure is state 10.

Additionally, the model can ﬁnd the probability that

one path will terminate before the other, i.e. the prob-

ability that s

forms before s

and vice versa. An

observation that the probability s

forms before s

0.027397281, and the probability that s

forms be-

fore s

is 0.972602719. Similarly, queries can con-

ﬁrm the order of states which represents the order of

structures formed.. For example, the probability that

forms before s

is 0. Figure 2 for execution times

for property veriﬁcation on the model. The sample

queries are reachability queries. The time of execu-

tion of this simple model is within 1 second.

4.2.2 CTMC Model

In the CTMC model, each structure is represented is

labeled to a state. The rates are used from Figure 1

and Figure 2. The CTMC model incorporates tran-

sition rates that were calculated by following the pro-

cess outlined in previous work (Entzian and Raden,

2020): The steps to calculate the transition rates are:

1. The Boltzmann weight, w(i), of each structure

is calculated. The weight is deﬁned by w(i) =

−E(i)/RT

. The terms i, E(i), R, and T are as pre-

viously stated in Deﬁnition 6.

2. The weight of the path, w(p) is calculated and is

the sum of the Boltzmann weights of all structures

up to and including that point in the path.

3. The energy of the path is calculated and is deﬁned

by E(p) = −RT log(w(p)).

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

240

4. The transition rate is deﬁned by a Metropolis rate,

represented by min(1,

E(p)−E(p

)

In addition to probabilities, the CTMC model can

be used to incorporate time. For instance, at time in-

stant 4, the probability of the RNA molecule exist-

ing in s

is 2.772e − 25. Figure 3 shows the times

recorded on sample CSL queries on the simulation

model. The times for execution for the sample queries

is less than 0.01 second. The computational feasi-

bility of the model is efﬁcient for the simple model.

Therefore, experiments can be performed on large

problem sizes.

5 CONCLUSION

The formalism for RNA structure prediction using

graph rewriting provided insights how a computa-

tional feasible model can be implemented. The model

also demonstrates how uncertainty can be incorpo-

rated in the model and can be quantiﬁed in terms of

the probabilities. A model deﬁned by rewriting rules

in the PRISM model checker will become more useful

when different initial RNA strands are used as input

for validation for the formalism. The PCTL and CSL

logics are able to express different but complicated

properties of the system. The formalism provides a

foundation for a rigorous evaluation of RNA structure

prediction. Future work would include experiments

on large datasets of RNA structure.

ACKNOWLEDGEMENTS

A part of this project was supported by grant

P20GM103499-20 (SC-INBRE) from the National

Institute of General Medical Sciences, National In-

stitutes of Health (NIH). Its contents are solely the

responsibility of the authors and do not necessarily

represent the ofﬁcial views of the NIH. KG was sup-

ported by NSF CCF-2227898 for part of the work.

REFERENCES

RNAeval Webserver. http://rna.tbi.univie.ac.at/cgi-bin/

RNAWebSuite/RNAeval.cgi , last accessed on

10/17/2022.

Aziz, A., Sanwal, K., Singhal, V., and Brayton, R. (1996).

Verifying continuous time markov chains. In Inter-

national Conference on Computer Aided Veriﬁcation,

pages 269–276. Springer.

Aziz, A., Singhal, V., Balarin, F., Brayton, R. K., and

Sangiovanni-Vincentelli, A. L. (1995). It usually

works: The temporal logic of stochastic systems. In

International Conference on Computer Aided Veriﬁ-

cation, pages 155–165. Springer.

Baier, C., Katoen, J.-P., and Hermanns, H. (1999). Approx-

imative symbolic model checking of continuous-time

markov chains. In International Conference on Con-

currency Theory, pages 146–161. Springer.

Baier, C., Katoen, J.-P., and Larsen, K. G. (2008). Princi-

ples of model checking. MIT press.

Bonnet, E., Rzazewski, P., and Sikora, F. (2020). Designing

rna secondary structures is hard. Journal of Computa-

tional Biology, 27(3):302–316.

Chabrier-Rivier, N., Chiaverini, M., Danos, V., Fages, F.,

and Sch

achter, V. (2004). Modeling and querying

biomolecular interaction networks. Theoretical Com-

puter Science, 325(1):25–44.

Danos, V., Feret, J., Fontana, W., Harmer, R., Hayman,

J., Krivine, J., Thompson-Walsh, C., and Winskel, G.

(2012). Graphs, rewriting and pathway reconstruc-

tion for rule-based models. In FSTTCS 2012-IARCS

Annual Conference on Foundations of Software Tech-

nology and Theoretical Computer Science, volume 18,

pages 276–288.

Dowell, R. D. and Eddy, S. R. (2004). Evaluation of several

lightweight stochastic context-free grammars for rna

secondary structure prediction. BMC bioinformatics,

5(1):1–14.

Entzian, G. and Raden, M. (2020). pourrna—a time-and

memory-efﬁcient approach for the guided exploration

of rna energy landscapes. Bioinformatics, 36(2):462–

469.

Fu, X., Wang, H., Harrison, R. W., and Harrison, W. L.

(2008). A rule-based approach for rna pseudoknot

prediction. International journal of data mining and

bioinformatics, 2(1):78–93.

G Taentzer, U. P. and K Ehrig, H. E. (2006). Fundamen-

tals of algebraic graph transformation. with 41 ﬁgures

(monographs in theoretical computer science. an eatcs

series).

Ganesh, V., O’donnell, C. W., Soos, M., Devadas, S., Ri-

nard, M. C., and Solar-Lezama, A. (2012). Lynx:

A programmatic sat solver for the rna-folding prob-

lem. In International Conference on Theory and

Applications of Satisﬁability Testing, pages 143–156.

Springer.

Gruber, A. R., Lorenz, R., Bernhart, S. H., Neub

ock, R.,

and Hofacker, I. L. (2008). The vienna rna websuite.

Nucleic acids research, 36(suppl 2):W70–W74.

Hansson, H. and Jonsson, B. (1994). A logic for reasoning

about time and reliability. Formal aspects of comput-

ing, 6(5):512–535.

Heath, J., Kwiatkowska, M., Norman, G., Parker, D., and

Tymchyshyn, O. (2008). Probabilistic model checking

of complex biological pathways. Theoretical Com-

puter Science, 391(3):239–257.

Jonoska, N., Obatake, N., Poznanovi

c, S., Price, C., Riehl,

M., and Vazquez, M. (2021). Modeling rna: Dna hy-

brids with formal grammars. In Using Mathematics

to Understand Biological Complexity, pages 35–54.

Springer.

Formal Analysis of Rewriting System Representing RNA Folding

241

Kirkpatrick, B., Hajiaghayi, M., and Condon, A. (2013).

A new model for approximating rna folding trajecto-

ries and population kinetics. Computational Science

& Discovery, 6(1):014003.

Krause, C. and Giese, H. (2012). Probabilistic graph trans-

formation systems. In International Conference on

Graph Transformation, pages 311–325. Springer.

Kwiatkowska, M. (2003). Model checking for probability

and time: from theory to practice. In 18th Annual

IEEE Symposium of Logic in Computer Science, 2003.

Proceedings., pages 351–360. IEEE.

Kwiatkowska, M. and Thachuk, C. (2014). Probabilis-

tic model checking for biology. In Software Systems

Safety, pages 165–189. IOS Press.

Lyngsø, R. B. and Pedersen, C. N. (2000). Pseudoknots

in rna secondary structures. In Proceedings of the

fourth annual international conference on Computa-

tional molecular biology, pages 201–209.

Mamuye, A., Merelli, E., and Tesei, L. (2016). A graph

grammar for modelling rna folding. Electronic Pro-

ceedings in Theoretical Computer Science, 231:31–

41.

Maximova, M., Giese, H., and Krause, C. (2018). Prob-

abilistic timed graph transformation systems. Jour-

nal of logical and algebraic methods in programming,

101:110–131.

Parker, D. A. (2003). Implementation of symbolic model

checking for probabilistic systems. PhD thesis, Uni-

versity of Birmingham.

Quadrini, M., Tesei, L., and Merelli, E. (2019). An alge-

braic language for rna pseudoknots comparison. BMC

bioinformatics, 20(4):1–18.

Quadrini, M., Tesei, L., and Merelli, E. (2020). Aspralign: a

tool for the alignment of rna secondary structures with

arbitrary pseudoknots. Bioinformatics, 36(11):3578–

3579.

Riddihough, G. (2016). Signals in rna. Science,

352(6292):1406–1407.

Rogers, E., Murrugarra, D., and Heitsch, C. (2017). Con-

ditioning and robustness of rna boltzmann sampling

under thermodynamic parameter perturbations. Bio-

physical journal, 113(2):321–329.

Sato, K., Akiyama, M., and Sakakibara, Y. (2021). Rna sec-

ondary structure prediction using deep learning with

thermodynamic integration. Nature communications,

12(1):1–9.

Zhao, Q., Zhao, Z., Fan, X., Yuan, Z., Mao, Q., and Yao, Y.

(2021). Review of machine learning methods for rna

secondary structure prediction. PLoS computational

biology, 17(8):e1009291.

BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms

242