Theoretical Study of the Fidelity of Transcription

Yao-Gen Shu

1,2

, Ming Li

and Zhong-Can Ou-Yang

Bioinformatics Laboratory of Yishang Innovation Technology Co., Ltd, Beijing 100081, China

Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China

School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

Keywords:

Gene Transcription Fidelity, First-passage Approach, First-order Neighbor Effects.

Abstract:

This year we celebrate the 50

anniversary of the discovery of the three eukaryotic RNA polymerases. Ever

since this seminal event was uncovered by Robert Roeder in 1969(Roeder and Rutter, 1969), researchers

have investigated the intricate mechanisms of gene transcription with great dedication. However, there is not

breakthrough in study of the ﬁdelity of transcription still. Here, we propose a simplest model with ﬁrst-order

neighbor effects, a ﬁrst-passage approach, to theoretically investigate the gene transcription ﬁdelity.

1 INTRODUCTION

Transcription is the process in which a gene’s DNA

sequence is transcribed by an RNA polymerase

(RNAp). RNAp uses one of the DNA strands as

a template to make a new, complementary RNA

molecule. The transcription cycle includes three

phases: initiation, elongation, and termination. The

initiation phase involves recognition of promoter

DNA, DNA opening, and synthesis of a short ini-

tial RNA oligomer. During the elongation phase,

the polymerase uses the DNA template to extend the

growing RNA chain in a processive manner. Finally,

DNA and RNA are released during termination, and

the polymerase can then be recycled and re-initiate

transcription(Cramer, 2019).

The probability of mismatch during transcription

is more than that during replication of the DNA. It is

because errors in transcription may not be fatal due to

its non-inherited. When any mismatched nucleotide

is added to the template DNA, the RNAp will halt

and then it will either proofread the nascent chain

or continue without correcting. There are two ma-

jor ways of proofreading: pyrophosphorolytic editing

and hydrolytic one. Once the incorrect nucleotide is

added, the very ﬁrst way to rectify that mistake is

by pyrophosphorolytic editing. In this, a pyrophos-

phate (PPi) will enter the active cleft and attack the

wrong nucleotide added, i.e. NMP (such as AMP,

GMP, CMP or UMP). It would lead to the conversion

of NMP to NTP. Thus the incorrect nucleotide is re-

moved, and the RNAp will add a correct nucleotide. If

pyrophosphorolytic editing can not rectify the mRNA

sequence, then another method called hydrolytic edit-

ing is activated. In this proofreading, certain proteins

belonging to Gre and Nus protein family gets acti-

vated. If the correct base pair is not added, the RNAp

will halt, and these proteins will enter the active cleft

of the core enzyme. These proteins would remove 4

Nucleotides from the nascent RNA chain and then the

RNAp will add the correct nucleotides again(Libby

and Gallant, 1991; von Hippel, 1998).

It’s now widely acknowledged that match (A→U,

T→A and G↔C, denoted as Right(R) pairs) play

a dominate role in the transcription, while the mis-

match (denoted as Wrong(W) pairs) occur with very

low probability. This is not due to the difference be-

tween the free energy of R and W pairs in the dou-

ble chains: in fact, this free energy difference is only

about 2 ∼ 4k

T (where k

and T are Boltzmann con-

stant and absolute temperature respectively.), which

cannot account for such low mismatch probability

if it is estimated by Boltzmann factor. As pointed

out by J.Hopﬁeld(Hopﬁeld, 1974) and J.Ninio(Ninio,

1975) in study of the ﬁdelity DNA polymerases, the

low mismatch probability may originate from the

huge difference of replication kinetics between R and

W(I. R. Lehman and Kornberg, 1958; Kunkel and

Bebenek, 2000).

The ﬁdelity of DNA replication has been inves-

tigated in kinetics recently(Y. G. Shu and Li, 2015;

Y. S. Song and Li, 2017; Gaspard, 2017; Q. S. Li and

Li, 2019; M. Li and Shu, 2018). However, the ﬁdelity

of transcription is rarely studied. Here, we propose a

Shu, Y., Li, M. and Ou-Yang, Z.

Theoretical Study of the Fidelity of Transcription.

DOI: 10.5220/0009181002390241

In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS, pages 239-241

ISBN: 978-989-758-398-8; ISSN: 2184-4305

239

Figure 1: Kinetics of transcription in our model. At the end

of initiation phase (after the synthesis of a short initial RNA

oligomer, that is the correct nucleation), there is a reﬂect-

ing boundary at starting of elongation, which corresponds to

Eqs.(1) or (2). At i ∈ [2, 4], there is no hydrolytic editing if

incorrect nucleotide is added, which corresponds to Eqs.(3)

or (4). At i ∈ [5, L − 6], a normal elongation phase, which

can be described by Eqs.(5) or (6). There is no hydrolytic

editing for i ∈ [L − 5, L − 2] due to absorbing boundary at

termination as presenting in Eqs.(7) or (8). At absorbing

boundary (termination, i = L − 1), the kinetics is described

by Eqs.(9) or (10).

simplest model to theoretically try the ﬁdelity of tran-

scription.

2 FIRST-PASSAGE MODEL

For brevity and not losing generality, we suppose that

the gene consists of two kinds of units A (denotes A or

T) and G (denotes G or C), and correspondingly two

kinds of monomers U (denotes U or A) and C (denotes

C or G) are to be added to the end of the growing chain

and paired with A or G to transcript the gene with L

bases. Since U pairs with A much more probably than

with G, we denote (

) as the match 1 pair and (

) as

the mismatch 0 pair. Similarly, we denote (

) as 1

and (

) as 0. Besides, we only consider the ﬁrst-order

copolymeriztion processes as below: the adding rate

of a correct nucleotide at match end is k

; that of a

correct nucleotide at mismatch end is k

−

, while the

adding rate of an incorrect nucleotide at match end is

−

; that of a incorrect nucleotide at mismatch end is

−

Since transcription proceeds unidirectionally, we

assume that the nascent chain initiates from a pre-

existing seed (a short initial RNA oligomer) after ini-

tiation phase, then elongates as a template-directed

binary copolymer, and terminates at when its length

reaches L. In the elongation phase, the monomer U or

C can be added to or deleted from the growing end due

to proofreading, where k

and k

denote the rate of

pyrophosphorolytic editing and that of hydrolytic one

with removing 4 bases respectively. In contrast, the

initial seed and the lastly-added monomer can not be

deleted. In other words, this is a ﬁrst-passage process

from a reﬂecting boundary at the ﬁrst position to an

absorbing boundary at the last position. It’s worth to

note that the initiation and termination here are purely

imaginary to simplify the mathematical treatments

and do not imply the fact of initiation and termination

of transcription. The ﬁrst-passage treatment largely

simpliﬁes the calculations by introducing a closed set

of kinetic equations, so it is more convenient to be

chosen for approximate calculations. Above kinetic

framework is shown in Fig.1.

The probability of the growing RNA sequence

...α

(where 1 ≤ i ≤ L, and α

= 1 denotes pairs

of (

), (

), or (

). Otherwise, α

= 0.) appearing

in the transcription process along gene D

· · · D

denotes A/T/C/G) at time t is denoted as ρ

···D

···α

(t).

Now we have the following master equations.

At the end of initiation phase (after the synthesis

of a short initial RNA oligomer, that is the correct nu-

cleation), there is a reﬂecting boundary at starting of

elongation.

···D

= k

+ k

···D

+ k

···D

1α

(1)

···D

= k

−

+ k

···D

+ k

···D

0α

(2)

There is no hydrolytic editing if incorrect nu-

cleotide is added at 2 ≤ i ≤ 4,

···D

···α

= k

···D

···α

+ k

···D

···α

1α

i+2

···α

i+4

···D

···α

i−1

+ k

−

···D

···α

i−1

−



+ k

−



···D

···α

(3)

···D

···α

= k

···D

···α

+ k

···D

···α

0α

i+2

···α

i+4

−

···D

···α

i−1

+ k

−

···D

···α

i−1

−



+ k

−

+ k

−



···D

···α

(4)

At elongation phase for 5 ≤ i ≤ L − 5

···D

···α

= k

···D

···α

+ k

···D

···α

1α

i+2

···α

i+4

···D

···α

i−1

+ k

−

···D

···α

i−1

−



+ k

−



···D

···α

(5)

···D

···α

= k

···D

···α

+ k

···D

···α

0α

i+2

···α

i+4

−

···D

···α

i−1

+ k

−

···D

···α

i−1

−



+ k

−

+ k

−



···D

···α

(6)

There is not item of hydrolytic editing for L − 4 ≤

BIOINFORMATICS 2020 - 11th International Conference on Bioinformatics Models, Methods and Algorithms

240

i ≤ L − 2 due to absorbing boundary at termination.

···D

···α

= k

···D

···α

+ k

···D

···α

i−1

+ k

−

···D

···α

i−1

−



+ k

−



···D

···α

(7)

···D

···α

= k

···D

···α

+ k

−

···D

···α

i−1

+ k

−

···D

···α

i−1

−



+ k

−

+ k

−



···D

···α

(8)

At absorbing boundary (termination, i = L − 1),

···D

···α

= k

···D

···α

i−1

+ k

−

···D

···α

i−1

(9)

···D

···α

= k

−

···D

···α

i−1

+ k

−

···D

···α

i−1

(10)

One of our major concerns is the ﬁnal distribu-

tion of the nascent RNA sequence, i.e, the long-time

limit P

···D

···α

= ρ

···D

···α

(t → ∞). To calculate it, we

assume the initial conditions ρ

···D

(t = 0) ≡ q

where q

+ q

= 1. q

can be arbitrarily chosen be-

cause it has negligible impacts on the ﬁdelity pro-

ﬁle except few positions near the reﬂecting boundary.

Thus, the P

···D

···α

can be calculated by simulation with

known parameters L, k

−

, k

−

, k

−

, k

and k

We deﬁne the ﬁdelity of limited length transcrip-

tion in percent as

ﬁdelity of transcription ≡

∑

i=1

× 100%, (11)

which is different from the deﬁnition of that of DNA

replication.

3 DISCUSSION

Though mismatch in transcription is not fatal, it is

tightly related to cancer. In this manuscript, we pro-

posed a general approach, based on the ﬁrst-passage

description of the transcription process, to calculate

the positional ﬁdelity for any given gene. The math-

ematical treatment is shown in Eqs.(1)-(10) with 6

parameters. Although we only consider the ﬁrst-

order copolymeriztion processes, it can be extended

to higher-order copolymeriztion as Ref(Q. S. Li and

Li, 2019).

We neither do the simulation nor solve analyt-

ically Eqs. by iteration as Refs.(Gaspard, 2017;

Q. S. Li and Li, 2019) because of the lack of experi-

mental data such as parameters and ﬁdelity. Neverthe-

less, this is a preliminary theoretical investigation. We

hope it will motivates experimenter to quantitatively

measure parameters and the ﬁdelity deﬁned above.

ACKNOWLEDGEMENTS

The authors thank the ﬁnancial support by Key Re-

search Program of Frontier Sciences of CAS (No.

Y7Y1472Y61), National Natural Science Founda-

tion of China (No.11574329, 11774358, 11675180),

CAS Strategic Priority Research Program (No.

XDA17010504), and CAS Biophysics Interdisci-

plinary Innovation Team Project (No.2060299).

REFERENCES

Cramer, P. (2019). Eukaryotic transcription turns 50. Cell,

179:808.

Gaspard, P. (2017). Iterated function systems for DNA

replication. Phys. Rev. E, 96:042403.

Hopﬁeld, J. J. (1974). Kinetic proofreading: A new mecha-

nism for reducing errors in biosynthetic processes re-

quiring high speciﬁcity. PNAS, 71:4135.

I. R. Lehman, M. J. Bessman, E. S. S. and Kornberg,

A. (1958). Enzymatic synthesis of deoxyribonucleic

acid: I. preparation of substrates and partial puriﬁ-

cation of an enzyme from escherichia coli. J. Biol.

Chem., 233:163.

Kunkel, T. A. and Bebenek, K. (2000). DNA replication

ﬁdelity. Ann. Rev. Biochem., 69:497.

Libby, R. T. and Gallant, J. A. (1991). The role of rna poly-

merase in transcriptional ﬁdelity. Mol. Micro., 5:999.

M. Li, Z. C. O.-Y. and Shu, Y. G. (2018). Study

on the ﬁdelity of biodevice T7 DNA polymerase.

BIOSTEC2018, Volume3: BIOINFORMATICS:135.

Ninio, J. (1975). Kinetic ampliﬁcation of enzyme discrimi-

nation. Biochimie, 57:587.

Q. S. Li, P. D. Zheng, Y. G. S. Z. C. O.-Y. and Li, M.

(2019). Template-speciﬁc ﬁdelity of DNA replication

with high-order neighbor effects: A ﬁrst-passage ap-

proach. Phys. Rev. E, 100:012131.

Roeder, R. G. and Rutter, W. J. (1969). Multiple forms

of DNA-dependent RNA polymerase in eukaryotic or-

ganisms. Nature, 224:234.

von Hippel, P. H. (1998). An integrated model of the tran-

scription complex in elongation, termination, and edit-

ing. Science, 281:660.

Y. G. Shu, Y. S. Song, Z. C. O.-Y. and Li, M. (2015). A gen-

eral theory of kinetics and thermodynamics of steady-

state copolymerization. J. Phys.:Condens. Matter,

27:235105.

Y. S. Song, Y. G. Shu, X. Z. Z. C. O.-Y. and Li, M.

(2017). Proofreading of DNA polymerase: a new

kinetic model with higher-order terminal effects. J.

Phys.:Condens. Matter, 29:025101.

Theoretical Study of the Fidelity of Transcription

241