Towards a Large Integrated Model of Signal Transduction and Gene

Regulation Events in Mammalian Cells

Liam G. Fearnley

, Mark A. Ragan

and Lars K. Nielsen

Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia

Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia

Keywords:

Signal Transduction, Transcription, Translation, Modelling, Large Scale Models.

Abstract:

Recent work has generated whole-cell and whole-process models capable of predicting phenotype in simple

organisms. The approaches used are hindered in higher organisms and more-complex cells by a lack of kinetic

parameters for reactions and events, and the difﬁculty of measuring and estimating these. Here, we outline a

large, two-process model capable of predicting the effects of gene expression on a signal transduction network.

Our method models signal transduction and the processes involved in gene expression as two separate systems,

solved iteratively. We show that this approach is sufﬁcient to capture functionally signiﬁcant behaviour result-

ing from common network motifs. We further demonstrate that our method is scalable and efﬁcient to the size

of the largest signal transduction databases currently available. This approach enables analysis and prediction

in the absence of kinetic data, but is itself held back by the lack of detailed large-scale gene expression models.

However, research consortia such as ENCODE and FANTOM are rapidly adding to the knowledge of tran-

scriptional regulation, and we anticipate that incorporating this data into our regulatory model could allow the

modelling of complex cellular phenomena such as the structured progression seen in cellular differentiation.

1 INTRODUCTION

The recent development of the ﬁrst computational

model of an entire cell was a watershed moment in

computational biology. This model simulated indi-

vidual processes within a cell using detailed kinetic

models, integrating and updating the points at which

they interacted at discrete time intervals (Karr et al.,

2012). However, such an approach is currently lim-

ited to small organisms such as Mycobacterium gen-

italium for one major reason — data describing re-

action dynamics are simply not available for the vast

majority of processes, and measuring or estimating

these parameters is practically impossible in more-

complex cells.

In higher organisms, computational models of

signal transduction have been successfully used to

predict in vivo phenotype and signalling activity in

individual signalling pathways, including T-cell re-

ceptor signalling (Saez-Rodriguez et al., 2007), and

Wnt/MAPK signalling (Handorf and Klipp, 2012),

EGFR/ErbB signalling (Samaga et al., 2009), and

Wnt/β-catenin signalling (Kofahl and Wolf, 2010)..

With up to 500 participant entities, these models rep-

resent a relatively small fraction of the data held in

rapidly expanding, community-curated databases of

signal transduction. The Reactome database already

covers more than 7500 participants (Matthews et al.,

2008), and contains approximately 20% of proteins

annotated as signal transducers in the Gene Ontology

(via DAVID (Huang et al., 2009) analysis).

Most signal transduction models are time-

parameterized (i.e., contain some estimate of how fast

a given event proceeds). These models (e.g. (Chen

et al., 2009)) use ordinary differential equations to

represent simple three-parameter (initial species con-

centration, forward rate constant, reverse rate con-

stant) models of kinetics, with estimated and ﬁtted

values. Due to the previously discussed lack of mea-

sured reaction dynamics and kinetics, using such a

technique to model the Reactome network would re-

quire the elucidation of at least 11,600 thermody-

namic and kinetic constants and additional param-

eters in each experimental system. Even using a

vastly simpler model, such as a Boolean system with

priority classes and/or reaction timinings(e.g. (Saez-

Rodriguez et al., 2007)) requires the ﬁtting of thou-

sands of parameters and implicit comparison of sev-

eral million pairs of reaction rates. Direct measure-

ments to determine this number of parameters is not

117

G. Fearnley L., A. Ragan M. and K. Nielsen L..

Towards a Large Integrated Model of Signal Transduction and Gene Regulation Events in Mammalian Cells.

DOI: 10.5220/0004739601170122

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2014), pages 117-122

ISBN: 978-989-758-012-3

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

feasible using current technology. As a result, statisti-

cal techniques have been developed (Heinrich et al.,

2002) to estimate and ﬁt models to observed data,

however the attendant risk of statistical over-ﬁt and

over-parameterisation present signiﬁcant problems at

scale.

As an alternative, time-invariant models of signal

transduction have been proposed (e.g. (Haus et al.,

2009)). These are composed of a set of partic-

ipant entities and their interactions represented as

Boolean variables and statements respectively, with

no reaction timings. It has been shown that such

models are capable of modelling systems at sizes

equivalent to the largest current signal transduction

databases (Fearnley and Nielsen, 2012). However,

signal transduction produces activated transcription

factors which control gene expression. This includes

the expression of signals and components of the sig-

nal transduction system itself. Indeed, many impor-

tant biological phenomena can be captured only by

combined transduction-transcription models, includ-

ing gene-level feedback control (e.g., SOCS in the

JAK-STAT pathway (Naka et al., 1997)) and cellular

differentiation (where new signals and components of

the transduction system are unveiled gradually over

time).

In this study, we explored a simpliﬁed model that

captures both transduction and transcription without

requiring parameterisation, guided by the insight that

signal transduction occurs on a relatively fast time-

scale (seconds or tens of seconds) compared to the

processes of transcription and translation (tens of

minutes). We demonstrate that our technique can

handle common regulatory circuits without the need

for any parameterisation, and further show its com-

putational tractability with increases in model size of

the scale needed to deal with transcription data aris-

ing from projects such as ENCODE (The ENCODE

Project Consortium, 2011). In doing so, we demon-

strate one possible mechanism for dealing with the

challenge of parameterisation.

2 METHODOLOGY

2.1 Formulation of a Sequentially

Integrated Model

Our modelling approach splits the processes of sig-

nal transduction and gene expression into two repeat-

ing stages which are modelled sequentially and sepa-

rately by ﬁrst obtaining a signal transduction network

state, then determining resulting changes to gene ex-

Integrated Model

Signal Transduction

(PATHLOGIC-S)

Transcriptional

Update Rules

Figure 1: Overview of the sequentially integrated model. A

signal transduction model with relevant experimental condi-

tions is instantiated from a database and solved. This state

is then ﬁxed, and the activities of transcription factors in

this state are passed to the transcription update rules. These

rules are applied, generating a list of states of transcription-

ally regulated proteins, feeding into a new instance of the

signal transduction model, which is solved prior to reappli-

cation of transcriptional update rules. This process repeats

until some termination criteria is reached.

pression. These changes are then re-applied to the

signalling system (Figure 1). Each of the two sub-

models is assumed to be time-invariant(i.e. the events

occur synchronously).

2.2 Signal Transduction Model

We begin by intialising the signal transduction model

with prior knowledge about network state (such as the

presence or absence of certain proteins or complexes

under some condition), and then simulate signalling

(using the PATHLOGIC-S speciﬁcation (Fearnleyand

Nielsen, 2012)), obtaining a stable state. This model

is built using databases with data available in BioPAX

Level 3 (BioPAX Consortium, 2006) format. These

data are converted into systems of logical statements

for each signal in the data as described in (Fearnley

and Nielsen, 2012). These statements take the general

form:

¬inhibitors∨

activators → event (1)

event →

signal (2)

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

118

This system of statements can then be convertedinto a

set of constraints for use with an integer programming

solver (Haus et al., 2009), with a user-speciﬁed objec-

tive function (here, we use a function that maximises

the number of active signal transduction events). The

initial state of entities within the network is set from

user input (and may be partially speciﬁed). The in-

teger programming problem is then solved to provide

the initial network state, s

2.3 Transcription and Translation

Model

After the state of the signal transduction system, s

is ﬁxed, the observed transcription factor activity is

transferred to the transcription model (a set of tran-

scriptional update rules). Current models of gene

regulation are insufﬁciently detailed (e.g. lack infor-

mation about the post-translational modiﬁcations re-

quired to ‘activate’ a transcription factor) for integra-

tion to the signal transduction system. We derived

a simple model from the BioPAX Level 3 data, not-

ing that a more-complex model (i.e. from a genetic

regulatory network inference tool) could be used its

place. In this, a set of update rules for each protein

under transcriptional control is derived and evaluated

by summing the activities of its inhibitors and pro-

moters:

activity =



1 if

∑

promoters

−

∑

inhibitors

> 0

0 else

(3)

A new instance of the signal transduction problem

i+1

is instantiated as previously described. Once the

transcription and translation update rules have been

applied, s

i+1

is solved to determine a successor state

i+2

, and the process repeats until a termination condi-

tion is met. The ﬁrst termination condition used is full

network oscillation, i.e. whether a state has been pre-

viously encountered. If it has, the process terminates

as an oscillation has been detected. Alternatively, the

process can be terminated at a user-deﬁned number of

steps to prevent runaway computations.

3 PRELIMINARY RESULTS

The signal transduction sub-model predicts the re-

sult of signal stimuli using the largest available reac-

tion systems. The prediction generated by this sub-

model is analogous to a long-exposure view of the

state of signals in the system, showing the net ac-

tivity over each process’ assigned time period. We

model transcription and translation using state update

ERα

Estrogen TFF1Complement C3

Estrogen-ERα Estrogen-ERα

homodimer

ERα

Estrogen TFF1Complement C3

Estrogen-ERα Estrogen-ERα

homodimer

ERα

Estrogen TFF1Complement C3

Estrogen-ERα Estrogen-ERα

homodimer

ERα

Estrogen TFF1Complement C3

Estrogen-ERα Estrogen-ERα

homodimer

Figure 2: Estrogen binds dimerised estrogen receptor,

which promotes transcription of TFF1 and C3 in this sub-

network of the broader signal transduction dataset. Tran-

scription shown as a dashed line. 2a shows the initial state

of the signalling model, with presence of estrogen receptor.

2b shows introduction of estrogen to the system, and 2c sig-

nal transduction following state 2b. This consists of binding

of estrogen to estrogen receptor α and subsequent dimeri-

sation. 2d shows the result of transcriptional update —

estrogen-receptor complex promotes transcription of TFF1

and C3 (resulting signalling activity not shown).

rules analogous to those used in Boolean GRNs (de

Jong, 2002). We evaluated the effectiveness of this

approach on two key network behaviours — that of

subnetwork activation, and that of the transcription-

ally mediated feedback loop. Further, we tested to

see whether the approach was scalable to the size of

large databases of signal transduction.

3.1 Subnetwork Activation

The estrogen nuclear receptor signalling system is

well-characterised, having been implicated in a num-

ber of cancer types (Schwartz et al., 2005; Keen and

Davidson, 2003). We consider a subnetwork of a val-

idated estrogen receptor alpha (ERα) network (NCI-

Nature Pathway Interaction Database, 2012) cover-

ing the regulation of transcription of trefoil factor 1

TowardsaLargeIntegratedModelofSignalTransductionandGeneRegulationEventsinMammalianCells

119

(TFF1, UniProt:P04155) and complement C3 (C3,

UniProt:P01024) by estrogen bound to ERα (Fig-

ure 2). Simulation requires the establishment of the

initial network state and introduction of stimulus.

Subequent signal transduction is predicted and a sta-

ble state is reached, involving the activation of a tran-

scription factor that enables signalling through a pre-

viously inaccessible set of reactions (Figure 2).

3.2 Feedback Mechanisms

A more-complex case occurs in the form of tran-

scriptionally mediated negative feedback. This oc-

curs when a transcription factor causes production of

a protein that inhibits its own activation (Figure 3a).

The resulting oscillation in transcription factor activ-

ity is similar to that in negative feedback in signal

transduction (when not transcriptionally mediated),

but over a much longer time interval.

An example of negative feedback of the type de-

scribed in Figure 3a is encountered in the set of seven

interactions described in the interleukin signalling

pathway from the Panther Pathways database (Pan-

ther Pathways, 2012) (Figure 3b). Interleukin bound

to its receptors and a signalling subunit catalyse the

phosphorylation of members of the JAK family (In-

terPro:IPR016251), which in turn catalyse the phos-

phorylation of members of the STAT protein fam-

ily (InterPro:IPR001217). These events result in the

transcription of several protein families, including

members of the SOCS family (InterPro:IPR022252).

SOCS proteins inhibit JAK-mediated phosphoryla-

tion of STAT (as mediated by JAK) (Levy and Dar-

nell, 2002).

We initialise our model with the presence of JAK

and STAT in the cytoplasm, phosphorylated ERK in

the nucleus, and the proteins required for formation

of the interleukin-receptor complex (Figure 3b). The

initial signalling network solution yields the expected

activation of dimeric diphosphorylated STAT in the

nucleus (Figure 3c), which triggers the transcription

rule that activates SOCS transcription. The presence

of SOCS then inhibits the phosphorylation of STAT,

resulting in the inactivation of STAT-mediated sig-

nalling (Figure 3d). This in turn results in the absence

of diphosphorylated STAT, which deactivates trans-

lation of SOCS. This behaviour is predicted by our

modelling approach in the form of an oscillation be-

tween solutions equivalent to the signalling network

network states (Figure 3c and 3d).

The levels of SOCS protein expression have been

experimentally characterised over a period of 5.25

hours (Yoshiura et al., 2007) in a population of syn-

chronized C3H10T1/2 mouse ﬁbroblast cells. The

Boolean model predicts regular, periodic oscillation

between high and low levels of SOCS family pro-

tein, which is observed experimentally with a peri-

odicity close to that implied by the model. The model

we used is untrained and does not require the exten-

sive parameter estimation necessary for an equivalent

ODE-based model. It should, however, be noted that

experimentally observed SOCS concentrations in the

model system also display a trend towards an overall

increase over time. Our modelling approach cannot

predict this due to the discretisation of signal concen-

trations into ’present’ and ’absent’ states inherent in a

Boolean representation.

3.3 Performance and Scalability

This approach can be used on both small and large

datasets. In the case of the ERα network and the

JAK-STAT-SOCS negative feedback mechanism, the

time required for computation to meet termination

criteria was on the order of 2-6 milliseconds. It

has been shown that the underlying method of sig-

nal transduction modelling (PATHLOGIC-S) is ca-

pable of efﬁciently enumerating network states for

very large networks such as Reactome (Fearnley and

Nielsen, 2012). We used randomly generated tran-

scription update rules in order to test the feasibility of

using our modelling approach with large systems (in

the absence of sufﬁciently large regulatory networks).

These rules consisted of assignment of between 0-

10 targets to known human transcription factors in

the system. We generated 50 such random assign-

ments, and obtained an average step execution time

(that taken for one signal transduction computation

and one application of update rules) of 6.3 seconds

on a standard desktop computer.

4 DISCUSSION

The difﬁculty of using and interpreting information

describing signal transduction events increases with

the amount of detail available. The recent rise and

expansion of large-scale, single-database repositories

of data such as Reactome, Panther, PID, and Path-

way Commons has driven the development of large-

scale signal transduction models an order of mag-

nitude larger than their precursors. However, these

models are incapable of modelling interesting biolog-

ical phenomena dependent on the interactions of other

processes (such as gene expression) with the signal

transduction system.

The development of methods that periodically in-

tegrate multiple process models have enabled the

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

120

Activator

Transcription Factor

Inhibitor

Interleukin

JAK

STAT

SOCS

Cytosol

Nucleus

Receptor Subunits

JAK

Interleukin-Receptor

Complex

STAT

p p

ERK

Interleukin

JAK

STAT

SOCS

Cytosol

Nucleus

Receptor Subunits

JAK

Interleukin-Receptor

Complex

STAT

ERK

STAT

Interleukin

JAK

STAT

SOCS

Cytosol

Nucleus

Receptor Subunits

JAK

Interleukin-Receptor

Complex

STAT

ERK

STAT

Figure 3: The general form of a transcriptionally mediated negative feedback loop consists of a transcription factor that

controls expression of an inhibitor of its own activation 3a. Vermilion species are currently active, blue are inactive, and

no state is assigned when the node background is white. ATP and ADP states not shown. 3b describes network topology

and initial conditions for the solver (Panther Pathways, 2012). 3c shows the initial solution prior to incorporation of the

transcriptional effect of diphosphorylated STAT (the dashed line, which represents STAT acting as a promoter for SOCS

transcription). 3d shows the second solution of the network following transcription. The system oscillates between 3c and 3d

as the activity of diphosphorylated STAT is toggled by the negative feedback loop.

modeling of entire simple organisms. Such methods

hold great promise, but are hindered by a lack of avail-

able reaction kinetic data and the difﬁculty inherent in

ﬁtting or measuring these. Here we describe a method

that uses a broad estimate of the kinetics of entire pro-

cesses (ie, the time taken to reach a stable state in a

system of biochemical reactions) rather than that of

their components in order to predict phenotype, with

promising initial results.

There are two assumptions inherent with this tech-

nique. Firstly, there is an assumption that the post-

translational modiﬁcations in the signal transduction

system reach a stable state in a time-scale an order

of magnitude faster than the process of gene expres-

sion. Secondly, it assumes that the combined pro-

cesses of gene expression and signal transduction can

be approximated as occurring asynchronously given

two individual process models that are synchronously

updated (albiet over a period of time, to some stable

state). The extent to which these assumptions hold

at the scale of a large scale experimental system is

uncertain, and will require signiﬁcant ongoing valida-

tion and reﬁnement. As more data about the kinetics

of these events becomes available, we anticipate the

transformation of the model into a set of sub-process

models of varying granularity and size, eventually be-

coming a system of ODEs as the kinetic landscape is

explored, following the example laid out in simpler

organisms (Karr et al., 2012).

For now, we have demonstrated that an approach

lacking any kinetic parameters bar the broadest ap-

proximation of entire processes is capable of captur-

ing the essence of oscillatory regulatory motifs such

as the STAT/SOCS negative feedback system, and

that the system is computationally tractable as the

size of the model increases. It is an open question

whether large-scale signal transduction models com-

bined with large-scale models of gene expression pro-

TowardsaLargeIntegratedModelofSignalTransductionandGeneRegulationEventsinMammalianCells

121

duce biologically meaningful results, due to the afore-

mentioned lack of gene expression data and models.

Datasets describing the activities of gene expression

are becoming available as projects such as ENCODE

and FANTOM5 begin to publish results. We antici-

pate that once such data are available, our model for-

mulation may be used to simulate cellular-scale sig-

nal transduction over time. We hypothesise that the

sequential, synchronised predictions of gene expres-

sion that our modelling technique generates will map

to the structured progression seen in differentiating

cells, and prove a valuable explanatory and predictive

tool in such contexts.

REFERENCES

BioPAX Consortium (2006). BioPAX: Biological pathways

exchange. Available online at http://www.biopax.org.

Retrieved June 2011.

Chen, W. W., Schoeberl, B., Jasper, P. J., Niepel, M.,

Nielsen, U. B., Lauffenburger, D. A., and Sorger,

P. K. (2009). Input-output behavior of ErbB signaling

pathways as revealed by a mass action model trained

against dynamic data. Mol. Syst. Biol., 5:239.

de Jong, H. (2002). Modeling and simulation of genetic

regulatory systems: a literature review. J. Comput.

Biol., 9(1):67–103.

Fearnley, L. G. and Nielsen, L. K. (2012). PATHLOGIC-S:

a scalable Boolean framework for modelling cellular

signalling. PLoS One, 7(8):e41977.

Handorf, T. and Klipp, E. (2012). Modeling mechanistic

biological networks: an advanced boolean approach.

Bioinformatics, 28(4):557–563.

Haus, U.-U., Niermann, K., Truemper, K., and Weisman-

tel, R. (2009). Logic integer programming models for

signaling networks. J. Comput. Biol., 16(5):725–743.

Heinrich, R., Neel, B. G., and Rapoport, T. A. (2002).

Mathematical models of protein kinase signal trans-

duction. Mol. Cell, 9(5):957–970.

Huang, D. W., Sherman, B. T., and Lempicki, R. A. (2009).

Systematic and integrative analysis of large gene lists

using david bioinformatics resources. Nat. Protoc.,

4(1):44–57.

Karr, J., Sanghvi, J., Macklin, D., Gutschow, M., Jacobs, J.,

Bolival, B., Assad-Garcia, N., Glass, J., and Covert,

M. (2012). A whole-cell computational model pre-

dicts phenotype from genotype.

Keen, J. C. and Davidson, N. E. (2003). The biology of

breast carcinoma. Cancer, 97(3 Suppl):825–833.

Kofahl, B. and Wolf, J. (2010). Mathematical modelling

of Wnt/β-catenin signalling. Biochem. Soc. Trans.,

38(5):1281–1285.

Levy, D. E. and Darnell, J. E. (2002). STATs: transcrip-

tional control and biological impact. Nat. Rev. Mol.

Cell Biol., 3(9):651–662.

Matthews, L., Gopinath, G., Gillespie, M., Caudy, M.,

Croft, D., de Bono, B., Garapati, P., Hemish, J., Herm-

jakob, H., Jassal, B., Kanapin, A., Lewis, S., Mahajan,

S., May, B., Schmidt, E., Vastrik, I., Wu, G., Birney,

E., Stein, L., and D’Eustachio, P. (2008). Reactome

knowledgebase of biological pathways and processes.

Nucleic Acids Res., 37:D619–22. PMID: 18981052.

Naka, T., Narazaki, M., Hirata, M., Matsumoto, T., Mi-

namoto, S., Aono, A., Nishimoto, N., Kajita, T., Taga,

T., Yoshizaki, K., Akira, S., and Kishimoto, T. (1997).

Structure and function of a new STAT-induced STAT

inhibitor. Nature, 387(6636):924–929.

NCI-Nature Pathway Interaction Database (2012). Vali-

dated estrogen receptor alpha network.

Panther Pathways (2012). Interleukin signaling pathway.

Retrieved December, 2012.

Saez-Rodriguez, J., Simeoni, L., Lindquist, J. A., Hemen-

way, R., Bommhardt, U., Arndt, B., Haus, U.-

U., Weismantel, R., Gilles, E. D., Klamt, S., and

Schraven, B. (2007). A logical model provides in-

sights into T-cell receptor signaling. PLoS Comput.

Biol., 3(8):e163.

Samaga, R., Saez-Rodriguez, J., Alexopoulos, L. G.,

Sorger, P. K., and Klamt, S. (2009). The logic

of EGFR/ErbB signaling: theoretical properties and

analysis of high-throughput data. PLoS Comput. Biol.,

5(8):e1000438.

Schwartz, A. G., Prysak, G. M., Murphy, V., Lonardo, F.,

Pass, H., Schwartz, J., and Brooks, S. (2005). Nu-

clear estrogen receptor beta in lung cancer: expression

and survival differences by sex. Clin. Cancer Res.,

11(20):7280–7287.

The ENCODE Project Consortium (2011). A user’s guide

to the Encyclopedia of DNA Elements (ENCODE).

PLoS Biol., 9(4):e1001046.

Yoshiura, S., Ohtsuka, T., Takenaka, Y., Nagahara, H.,

Yoshikawa, K., and Kageyama, R. (2007). Ultradian

oscillations of Stat, Smad, and Hes1 expression in re-

sponse to serum. PNAS, 104(27):11292–11297.

BIOINFORMATICS2014-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

122