Learning Dynamic Systems from Time-series Data

An Application to Gene Regulatory Networks

Ivo J. P. M. Timoteo and Sean B. Holden

University of Cambridge, Computer Laboratory,

William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, U.K.

Keywords:

Graphical Models, Local Search, Bioinformatics Application.

Abstract:

We propose a local search approach for learning dynamic systems from time-series data, using networks of

differential equations as the underlying model. We evaluate the performance of our approach for two scenarios:

ﬁrst, by comparing with an l

-regularization approach under the assumption of a uniformly weighted network

for identifying systems of masses and springs; and then on the task of learning gene regulatory networks,

where we compare it with the best performers in the DREAM4 challenge using the original dataset for that

challenge. Our method consistently improves on the performance of the other methods considered in both

scenarios.

1 INTRODUCTION

We address the following general inference problem.

We have a partially observable system consisting of

many components, some of which may interact with

one-another. Our aim is to infer which components

interact, and possibly the nature of the interaction.

Such systems will typically be partially observable in

the sense that, while measurements over time of one

or more properties of the system are available, these

time series do not directly expose the underlying in-

teractions.

In studying problems of this kind we have in mind

the speciﬁc application of inferring gene regulatory

networks (GRNs). The action of a gene within an or-

ganism is often regulated and coordinated by its in-

teraction with other genes, and considerable effort is

invested in trying to discover exactly how genes in-

teract. It is hoped that a better understanding of the

network of interactions might lead to improvedunder-

standing of the basic processes occurring in a cell, and

thus of how speciﬁc traits or diseases develop. In turn

this might lead to new techniques to treat disease or

control cell development. Inferring such relationships

from observed data is, however, difﬁcult because the

network is large and contains redundant paths, and be-

cause data is generally noisy due to the nature of the

experimental techniques used.

Biological processes are dynamic and we are in-

terested in interactions between molecules occurring

over time and in highly complex networks of interde-

pendent nodes. This suggests that in order to under-

stand a GRN it will be necessary to consider obser-

vations at multiple time instances, allowing us to ob-

serve temporal patterns and thus to better understand

causality and predict future behaviour. As a result re-

searchers have increasingly collected time-series gene

expression data—according to (Barrett et al., 2011),

the number of time-series datasets has been growing

exponentially.

Given the dynamic and temporal nature of the typ-

ically available data, it is natural to model gene ex-

pression networks using stochastic differential equa-

tions as suggested by (Ackers et al., 1982), (Gard-

ner and Collins, 2000) and (Marbach et al., 2010);

see also (Ly and Lipson, 2012) and (Voortman et al.,

2010) for further, related methods. Further, it seems

reasonable to assume that a similar model could

be used to learn the structure of the network it-

self, although we might expect this to be a compu-

tationally expensive process given the use of such

expressive models—for example, the one used by

GeneNetWeaver(Schaffter et al., 2011) and explained

in (Marbach et al., 2010).

In this paper we propose a simple approach for

the general problem of learning the structure of sys-

tems of differential equations. We demonstrate its

effectiveness ﬁrst by comparing it to an existing l

regularization method using data for a system of

masses and springs. We then apply it to the prob-

324

J. P. M. Timoteo I. and B. Holden S..

Learning Dynamic Systems from Time-series Data - An Application to Gene Regulatory Networks.

DOI: 10.5220/0005282303240332

In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM-2015), pages 324-332

ISBN: 978-989-758-077-2

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

lem of inferring GRNs. We address the DREAM4

Challenge data (Dream4Challenge, 2009) and com-

pare our method against the methods that performed

best in the challenge. These methods did not take time

series data into account, and we ﬁnd that our method

can improve on their performance by using them to

obtain an initial solution which it then develops fur-

ther using time series data.

2 INTERACTING DYNAMIC

SYSTEMS

2.1 General Model

We have a system of p time-dependent variables

(t) =



(t) ··· x

(t)



, and describe the evo-

lution of each variable using differential equations re-

lating it to other variables in the system. The evolu-

tion of the ith variable x

(t) is described by the expres-

sion

(t) = F

(x(t))dt. (1)

The functions F

are selected from a family F of func-

tions deﬁned as follows. Begin with a set of m ba-

sis functions f = { f

(·), f

(·),..., f

(·)}; these will be

chosen according to the domain of interest. Class F

consists of all functions of the form

F(x) =

∑

j=1

∑

k=1

(t)). (2)

where the parameters a

deﬁne a speciﬁc F.

Given the set f of basis functions, deﬁne f

(x) =



(x) ··· f

(x)



and

(x) =



) ··· f

)



Then the full system of p variables can be described

dx(t) = Af(x(t))dt (3)

where A is a p× pm matrix containing p sets of pa-

rameters a

Our goal is as follows: given the set f of basis

functions and an observed trajectory

0:T

≡ {x(0),x(η),...,x(T)} (4)

we wish to recoverthe structure of the system by iden-

tifying A. In (4), η is the interval between observed

values, and is assumed to be constant.

Example: Masses and Springs

Consider a dynamic system of masses connected by

springs. From the observation X

0:T

of the positions

of the masses over a time interval [0,T] we aim to in-

fer which masses are connected by springs. In this

case we know precisely which basis functions to use

as they follow directly from Newtonian mechanics.

Also, the matrix A now contains elements with val-

ues in {0,1} representing the presence or absence of

a spring.

Example: Gene Regulatory Networks

As noted in the introduction it is natural to model

gene expression networks using stochastic differen-

tial equations, and this is our central aim in what fol-

lows. Further, just as in the example of masses and

springs, we aim to infer useful information regarding

the structure of the network by inferring the matrix A.

2.2 Learning Algorithm

Since we observe a trajectory X

0:T

with interval η be-

tween consecutive values we should consider a dis-

crete model. The natural discretization of (3) is to

rewrite it as

x(t + 1) = x(t) + ηAf(x(t)) (5)

where x(t) and x(t + 1) are consecutive observations.

Assuming that our model is perturbed by Gaussian

noise we employ the least squares error function

Error(A) =

∑

i=1

n−1

∑

t=0

(t + 1) − x

(t) − ηA

f(x(t))]

(6)

Where A

is the ith row of A. This error function

has been shown to be convex when A is real-valued

(Banerjee et al., 2008).

Our proposed algorithm assumes that A has ele-

ments A

i, j

∈ W where W is a ﬁnite set. This assump-

tion is not too restrictive as in many interesting prob-

lems the existence of an edge and its nature are all

we are looking for. For example, in addressing gene

interactions we are mainly interested in identifying

whether a gene inhibits, enhances or does not inﬂu-

ence other genes. In this case W = {−1,0, 1} would

be an appropriate choice provided we have an ade-

quate set f of basis functions. Also, determining the

structure of the system in this way can form a ﬁrst

step making the edge weights easier to estimate after-

wards.

Even though (6) is convex for a real-valued A, our

choice to restrain A

i, j

to W entails the loss of that

guarantee. Nevertheless, we show that restraining the

search space according to the domain of interest, de-

spite the loss of convexity, can lead to improved per-

formance (Experiment 1).

LearningDynamicSystemsfromTime-seriesData-AnApplicationtoGeneRegulatoryNetworks

325

We use a local search strategy to avoid searching

all |W|

possible structures. We start with an arbi-

trary structure—perhaps chosen according to the spe-

ciﬁc domain to reduce the search time—and gener-

ate new candidates by selecting an edge and altering

the value among the possibilities in W. If the change

results in a decrease in the error, we keep the new

structure and restart. When we cannot generate new

candidates that decrease the error we stop the search.

This strategy can no longer guarantee that we achieve

a global minimum so we consider additional strate-

gies that may help us avoid local minima. For con-

venience, we will refer to this approach as ﬂip-edge

throughout the paper due to the way new structures

are generated.

It is of interest to note that the error function used

is decomposable, in the sense that is familiar from

probabilistic graphical models (Koller and Friedman,

2009). This leads to a simple method for quickly eval-

uating the error. The error function is a sum of p sum-

mands, one for each variable and each itself having

limited dependencies. The total error is a sum of in-

dividual node errors

Error(A) =

∑

i=1

nodeError

(A) (7)

where

nodeError

(A) =

n−1

∑

t=0

(t + 1) − x

(t) − ηA

f(x(t))]

(8)

This is useful as our algorithm, from an initial struc-

ture, generates neighbouring structures based on one

edge ﬂip, and we want to minimise the error func-

tion. Let e

( j,k),i

be the edge from variable j to vari-

able i associated with f

. We are seeking the value

w ∈ W for e

( j,k),i

that minimises the error function.

Since e

( j,k),i

is only associated with row A

we know

that minimising the error of A is equivalent to min-

imising nodeError

(A). Let δ(e) be a function that re-

turns the w minimising Error(A) when ﬂipping edge

e. This function can easily be cached as it will only

change when some other element of the relevant A

changed.

3 EXPERIMENTS

3.1 Masses and Springs

We ﬁrst compare the ﬂip-edge algorithm with a l

regularised least squares (l

-rls) algorithm in a masses

and springs scenario as presented in (Bento et al.,

2010) and introduced above. We will assume rest dis-

tances, masses and elastic coefﬁcients are equal to 1

and focus on inferring which masses are connected.

The general equations that model the dynamics of

such a system are

dq(t) = v(t)dt

dv(t) = −∇U(q(t))dt + σdW(t)

U(q

(t)) =

∑

i, j

(||q

− q

|| − D

i, j

)

(9)

where q is the position, v is the velocity, D

i, j

is the

resting distance, dW(t) denotes the Wiener process

and σ

is the noise variance. The simulation is per-

formed using the Euler-Maruyama method for a time

step of 0.01. We sample using η = 0.1 and use

σ = 0.2 unless explicitly stated.

The same dynamic system was previously used as

an example by (Bento et al., 2011) though with a dif-

ferent target structure. (Bento et al., 2010) propose

the l

-rls algorithm for the recovery of the structure of

a network of stochastic differential equations

. The

idea is to recover each row of the support matrix A

independently by solving a convex optimisation prob-

lem. For each row i, A

is estimated by minimising

L(A

0:T

) + λ||A

(10)

where

L(A

0:T

) =

2η

∑

t=0

(t + 1) − x

(t) − ηA

f(x(t))]

(11)

We used the CVX package (Grant and Boyd, 2008;

Grant and Boyd, 2012) to describe and solve the con-

vex program.

Our experiments used the following protocol: a

network is generated, the behaviour of this network

over an interval of time is simulated, and the result-

ing data is supplied to the algorithms. Each network

is generated as an N × M grid of masses, separated

along each axis by a unit of distance and connected

to all eight neighbours where possible. To generate

different structures, masses and springs are removed

with probability 0.1. We guarantee that all the springs

in the ﬁnal structure are connected to two masses.

In the experiments a 5 × 5 initial grid is used except

when explicitly studying the performance under dif-

ferent network sizes.

Our goal was to compare our method against

an optimally tuned instance of l

-rls. We therefore

started by running preliminary experiments to study

the response of the l

-rls algorithm to different values

Though their focus is on the theoretical bounds of the

recovery rather than on performance.

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

326

of its parameters λ and τ (a threshold above which

we consider the existence of an edge). The prelim-

inary study led to the following approach: we run

the algorithm for λ ∈ {0,0.05,0.10,...,0.90,0.95}

and τ ∈ {0,0.1,0.2,. ..,1.8,1.9} and use the best pair

(λ,τ), chosen by direct comparison against the results

obtained with the known correct structure, to extract

the structure for comparison with our algorithm. Fur-

thermore, we guaranteed that the chosen pair was a lo-

cal minimum by 8-neighbour comparison in the λ× τ

space considered.

In a realistic setting, we would not be able to tune

-rls using the correct structure. Nevertheless, we

wanted to compare our suggested ﬂip-edge algorithm

with the best possible l

-rls results.

Figure 1 shows the results from these experiments.

These results suggest that ﬂip-edge, while undoubt-

edly a straightforward algorithm, performs very well

in uniform weight networks. In fact, in our ex-

periments it surpasses the l

-regularization approach

common in the literature in its ability to recover the

structure of the network. Speciﬁcally, it proved more

resistant to noise, small trajectories of observed data,

and size of the network (for the same length of tra-

jectory). Furthermore, the fact that it does not require

tuning of the parameters, for which there seems to be

no obvious method in the l

-rls algorithm, makes it a

more straightforward method to use.

3.2 Learning Gene Regulatory

Networks from Time-series

Knockout Data

We assessed our algorithm in the context of identify-

ing GRNs by employing data from the DREAM4 In

Silico Network Challenge (Dream4Challenge, 2009)

(see also (Marbach et al., 2010; Marbach et al., 2009;

Prill et al., 2010)). This allowed us to assess our al-

gorithm against many other methods—at least 25 in

the challenge itself—evaluated under the same condi-

tions.

Time-series data are especially useful if gathered

after applying perturbations to the regulatory equilib-

rium of the cell, as the relationships are then more

apparent than in a stable cell where the expression of

each gene remains reasonably constant. A common

experiment is to knock out a gene, that is, to com-

pletely block its expression, and observe the system

as it re-adapts. With this in mind, the DREAM4 chal-

lenge involves inferring the structure of ﬁve gene reg-

ulatory networks, each with 100 genes. Two were ex-

tracted from E. coli and three from Yeast. We are pro-

vided with data on the steady-state expression levels

of the wild-type, and after the knockout and knock-

down of every gene. (Knockdowns reduce the expres-

sion of some gene to half of that in the wild-type.) We

are also provided with 10 time-series of 21 time points

where an unknown perturbation is applied at the start

and released half way through the time-series.

We used the three best-performing methods from

the DREAM4 Challenge as comparisons for our

method. Pinna et al. (Pinna et al., 2010) presented

the best method by applying a pruning procedure on

top of z-score to remove potentially redundant edges.

This algorithm has a parameter—denoted t in the

original paper and renamed to θ in the following—

used to decide when an edge should be included. The

z-score method of Prill et al. (Prill et al., 2010) was

placed second and the TRANSWESD method of Klamt

et al. (Klamt et al., 2010) achieved the third best per-

formance with a variant of transitive reduction for

weighted signed digraphs.

It is of interest to note that the three top scorers in

the DREAM4 Challenge did not make any use of the

time-series data, as it seems clear that it might provide

interesting insights into the correct structure. Figure

2 shows an example of three genes (P: dotted line,

D: dashed line and S: solid line) where there are in-

hibitory relationships from P to S, P to D and S to D.

However, from the observation of the initial and ﬁnal

states only, it would be impossible to infer anything

more than the inhibitory relationship from P to S.

We applied the ﬂip-edge algorithm to this data.

Our method relies on time-series data to improve the

inferences made by the existing static methods. Un-

like in the example using masses and springs we do

not in this case know a correct set of basis functions.

Bonneau et al. (Bonneau et al., 2006) suggest the use

of the logistic sigmoid (while ultimately using a lin-

ear approximation) and this agrees with our initial in-

tuition. The model thus becomes, for some general

gene i,

(t) = [−x

(t) + F(A

,β, x(t))]dt + σdW(t) (12)

where

F(A

,β, x(t)) =

max

)

1+ exp



−

∑

i, j

(t)



. (13)

Here, dW(t) is a Wiener process, β

denotes the

weight the expression of the jth gene would have on

the activation of the other genes, and and max

)

denotes the maximum observed value of x

in an equi-

librium state. We added the scaling factor max

)

since the observed maximum expression values may

vary considerably for each variable. However, this re-

sults in a need to estimate β. With the above model,

LearningDynamicSystemsfromTime-seriesData-AnApplicationtoGeneRegulatoryNetworks

327

Figure 1: Identiﬁcation of systems of masses and springs. We plot the number of incorrectly classiﬁed edges against, respec-

tively: noise variance, size of the observed time-series, and size of the structure being learnt. Each data point was obtained by

running the algorithm 10 times; the bars correspond to one standard deviation.

Figure 2: Example of the evolution of the expression of

three genes where time-series data, rather than just the ob-

servation of the initial and ﬁnal state, is crucial to infer all

three relationships.

the error function becomes

Error(A,β) =

∑

i=1

∑



(t + 1) − x

(t)

+ x

(t)−

max

)

1+ exp



−

∑

i, j

(t)



(14)

We used W = {−1, 0,1} as we are interested in

whether a gene is inhibiting, enhancing or just unre-

lated to the expression of some other gene. We ran all

experiments with η = 0.001 after observing the gene

expression changes, from step to step, in the time se-

ries data.

The ﬂip-edge algorithm was modiﬁed as follows

to incorporate the estimation of β:

1. Generate an initial network structure.

2. Use the current structure to estimate β.

3. Run ﬂip-edge using the estimated β.

In step 1 of the algorithm we generated prior struc-

tures using the two most successful methods from the

DREAM4 Challenge, and using different values for

θ. After this step we have the ﬁrst candidate A.

In step 2 we estimate β by performing ordinary

least squares multivariate regression. Note that in this

algorithm β

is constant—it does not itself vary with

time. The motivation for this assumption was that β

would be modelling transcription rates rather than the

effectiveness of the resulting protein in inﬂuencing

the transcription of the other genes. For the estimation

of β we only consider the gene expressions at stable

states such as the wild-type—that is, the expression

values of the non-perturbed network—and the stable

states after the knockout of a single gene. In a gen-

eral stable state s we have ds

(t)/dt = 0, along with

= F(A

,β,s) and



◦ s



β = −log



max

)

− 1



= g(s

) (15)

where ◦ denotes the Hadamard product. That is, the

expression of the gene s

is regulated by its parents

◦s

) weighted by β. Let S be the square matrix of

stacked s

and deﬁne g

(s) =



g(s

) ··· g(s

)



such that

g(s) = (A◦ S)β. (16)

Finally let y be a vector of stacked g(s) and X a matrix

of stacked A◦S so that we can estimate β using all the

instances of stable states available. We have

y = Xβ (17)

and we want to estimate β by least squares

argmin

∑

i=1

− X

β)

− X

β) (18)

where k denotes the number of stable state data in-

stances, y

corresponds to the ith g(s) and X

corre-

sponds to the ith (A◦ S), which gives us

β = (X

−1

y. (19)

In step 3 we use A from step 1 and β from step 2,

and run ﬂip-edge gradient descent using (14) which is

also decomposable. When applying ﬂip-edge we de-

cided to use only the second half of the time-series

data provided; that is, from the point when the pertur-

bation is lifted, as we do not know the exact nature of

the perturbation.

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

328

Table 1: Comparison of the results of other methods with

our method run for three rounds of estimation and ﬂip-edge

with different values of θ.

overall auroc aupr

Pinna et al 71.589 40.110 103.068

z-score 71.297 39.737 102.857

TRANSWESD 64.715 37.324 92.106

θ = 2.5 78.232 48.684 107.779

θ = 3 79.116 49.278 108.953

θ = 3.5 82.962 52.707 113.217

θ = 4 80.955 50.584 111.326

θ = 4.5 77.892 49.051 106.732

The DREAM4 Challenge evaluated performance

using the p-values for the area under the ROC curve

(AUROC) and area under the precision-recall curve

(AUPR) measures for each network—see (Schaffter

et al., 2011) and (Dream4Challenge, 2009) for a com-

plete description. The p-values provide an indication

of the probability that random predictions would be

of equal or better quality than those produced by the

method of interest. Let p

AUROC

and p

AUPR

be the ge-

ometric means of the corresponding p-values for the

ﬁve networks of the dataset. The scores are given by

auroc score = − log

AUROC

)

aupr score = −log

AUPR

)

overall score = −0.5log

AUROC

× p

AUPR

(20)

Figure 3 (a) shows the performance of our method

alongside that of the top three methods presented at

DREAM4. (As pointed out in (Schaffter et al., 2011),

the best performer in the DREAM3 Challenge (Yip

et al., 2010) did not repeat its previous success. The

method presented in (Yip et al., 2010) uses an ap-

proach similar to ours in trying to learn ﬁrst from

knockout and then from time-series data using differ-

ential equation models.)

Figure 3 (b) shows how the performance of our

algorithm is affected by using different methods to

choose the initial structure. Clearly the use of z-score

or the method of Pinna et al. has little effect on the

score achieved by our algorithm. This suggests that

our algorithm is improving due to the time series data,

independently of the initial structure and most likely

improving on different features than those targeted by

Pinna et al.’s pruning step. The average difference in

the score over the values of θ considered is 0.052 in

favour of Pinna et al., which is considerably smaller

than the original difference before the application of

our method (see Table 1).

Unfortunately, comparison using only the data

of the DREAM4 Challenge does not allow us to

have statistical conﬁdence in the performance of

our method, since we address only one prediction

task. However, the GeneNetWeaver distribution in-

cludes the kinetic models of the networks used in the

DREAM4 Challenge. This allows us to generate new

data from the same networks. We ran our method

as described above with θ = 3.5. We used 10 new

datasets from each individual network and we com-

pared our score with the score obtained by our im-

plementation of Pinna et al.’s method, in this case us-

ing θ ∈ {2, 2.5, 3,3.5} and ﬁnding that θ = 3.0 max-

imised its average score over the set of 50 prediction

tasks (ten datasets drawn from ﬁve networks). In Fig-

ure 4, we present the percentage improvement of our

method over Pinna et al. with θ = 3.0 for the ten

datasets for each of the ﬁve DREAM4 Challenge net-

works. In Figure 4 we see a box plot where the whole

boxes and whiskers are above zero. Furthermore, the

p-values for the t-test on the improvement for the dif-

ferent networks are 0.00064, 1.43057e-09, 0.00637,

1.88231e-05, and 6.6216e-10 meaning that the im-

provement of our method over the best scorer at the

DREAM4 Challenge is strongly signiﬁcant. In fact,

it boasts quite surprising improvement in some net-

works suggesting it is considerably more efﬁcient at

discovering particular graphical features.

4 FURTHER REFINEMENTS

Since the local search approach cannot guarantee to

end at the global minimum of the error function it is

interesting to consider some strategies to try to avoid

ﬁnishing the search in local minima. Many differ-

ent possibilities are presented in the literature, such

as random restarts, tabu searches and annealing meth-

ods, among others. Given the characteristics of our

problem we were led to exploring data perturbation

methods.

The idea is that it might be possible to escape lo-

cal minima if we slightly change the weight given to

each data point. Intuitively, we want to change the

weights of the data in the error function enough to es-

cape local minima while keeping the general shape of

the error function unaltered. A common approach is

to consider a probability distribution over the weights

and make it converge to the uniform distribution so

that we get closer and closer to the original dataset

over time.

Looking back at (14) we can see that the error

function can be decomposed as a sum of terms each

corresponding to a pair i,t.

Error(A,β) =

∑

i=1

∑

error(i,t,A,β) (21)

to which we can add a matrix of weights

LearningDynamicSystemsfromTime-seriesData-AnApplicationtoGeneRegulatoryNetworks

329

Figure 3: (a) Comparison of the scores of our method (white), for different values of θ, and the top three scorers (black) in the

DREAM4 Challenge. The horizontal dotted line represents the score of the best method at the DREAM4 Challenge.

(b) Comparison of the scores achieved by our method from different starting structures and using different values for θ. Black

bars: using the method of Pinna et al. as initial structure; white bars: using z-score as initial structure.

Figure 4: Percentage improvement of ﬂip-edge with θ = 3.5

and three rounds, over our implementation of the method

of Pinna et al. with θ = 3.0 (the best score from θ ∈

{2,2.5,3,3.5}) for ten new datasets generated from each

original DREAM4 network. The p-values for the t-test on

the improvement for the different networks are 0.00064;

1.43057e-09; 0.00637; 1.88231e-05; 6.6216e-10.

Error(A,β,w) =

∑

i=1

∑

i,t

error(i,t,A,β) (22)

4.1 Random Perturbation

For the random perturbation we sample each w

i,t

from

a Gamma distribution (as it is non-negative)

i,t

∼ Gamma(1/τ,τ) (23)

where τ is a “temperature” variable that decreases

over time. The above distribution has mean equal to

one and variance proportional to τ therefore forcing

i,t

to approach one over time.

4.2 Adversarial Perturbation

With the aim of escaping a possible local minimum, it

is possible to employ a more informed strategy when

choosing the weights used to perturb the data. The

general idea, as presented by (Elidan et al., 2002), is

as follows. We aim to modify the error function such

that the current minimum becomes less desirable. In

this way the optimization is able potentially to escape

from the minimum, should it be local. This means that

instead of randomly generating a new weight vector,

we will iteratively update it. In general one can con-

sider:

n+1

i,t

= w

i,t

+ ζ

∂Error

∂w

i,t

(24)

so that we increase the relevance of the weights that

contribute more towards an increased value of the er-

ror function. (Elidan et al., 2002) further developsthis

idea to ensure convergence towards the uniform dis-

tribution and avoid negative weight values. Follow-

ing their results we have the following weight update

equation

n+1

i,t

= α

n+1



i,t



κ+λ

i,t

)

κ+λ

−

κ+λ



∂Error

∂w

i,t



(25)

where α

n+1

is a normalisation constant and both κ and

λ are inversely proportional to the temperature τ.

In both the random and adversarial approaches,

we start with uniform weights and allow our algo-

rithm to converge to a (possibly local) minimum. We

then select a new set of weights using the appropri-

ate update equation and temperature, and optimise

again starting from the previously found minimum.

This process is repeated several times, updating the

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

330

temperature at each iteration, and the best minimum

found is returned.

We further extended our method by checking

whether the best solution was in a plateau, and if so

continuing the search in case we were able to escape

the plateau.

4.3 Results

All the experiments were run for ten temperature up-

dates where the temperature was reduced by a factor

of a (τ ← aτ). For the random perturbation, where

i,t

∼ Gamma(1/τ,τ), we used an initial τ = 1, a =

0.9 for ﬁve runs. For adversarial perturbation we used

initial τ = 0.001, a = 0.9, ζ = 1 and κ,λ = 1/τ unless

otherwise stated. Note that adversarial perturbation is

a deterministic procedure so only one run per param-

eter conﬁguration was performed.

81.5

82.5

83.5

Random Adversarial (1) Adversarial (2) Adversarial (3)

Figure 5: Score of different runs after including data pertur-

bation. The horizontal line marks the best score achieved

without data perturbation. The ﬁrst box refers to random

perturbation and the following three refer to adversarial per-

turbation: (1) for different ζ = [0.7,0.8,... , 1.3]; (2) for dif-

ferent initial τ = [10,1,... , 1e − 05]; (3) for different tem-

perature fall rates a = [0.1,0.2,... , 0.9]. Also plotted are

the actual values for each value of ζ, τ and a.

Figure 5 presents the results obtained using the re-

ﬁnements proposed above. We can see that they do

not consistently improve on the results obtained us-

ing non-perturbed data though some conﬁgurations of

parameters and some runs using random perturbation

indeed improved on the score. Checking for the ex-

istence of plateaus produced no improvement on the

score. On the rare occasions it lead to an escape the

search ended on a minimum equivalent to one found

for another temperature.

In terms of the value of the error functions, instead

of the DREAM4 score, data perturbation was consis-

tently reaching lower values. Also of note is the fact

that the correct network, with the data given, has a

higher value on the error function than the structures

found by those methods, meaning that our method

would discard it in favour of the networks it even-

tually found. This could be due to the noise in the

data or to the unavoidable misrepresentation of the

phenomenon by our chosen basis function. Neverthe-

less, all the alternatives studied here are clear win-

ners when compared to the best methods that tackled

the DREAM4 challenge, strongly suggesting that the

use of time series data and these simple, yet effective,

methods is beneﬁcial.

5 FINAL REMARKS

It is important to note that some of the data provided

in the DREAM4 Challenge datasets—namely, infor-

mation on the steady-state expression after the knock

out of each gene in the network—is very unrealistic.

However that data was only used in the creation of

an appropriate prior structure, and all subsequent im-

provement was made using time-series perturbation

data from ten experiments. This data is considerably

more realistic in terms of what might be produced in a

laboratory environment, especially since we were not

informed of the exact characteristics of the perturba-

tion.

It might be argued that with the release of the

information concerning GeneNetWeaver (Schaffter

et al., 2011)—the in silico simulator used to gener-

ate the datasets—and the datasets themselves with the

gold standard, we would have an unfair advantage.

Note, however, that our method is completely inde-

pendent from the design of GeneNetWeaver and all

the parameters apart from θ—used to create the ini-

tial structure before the time series dataset is used—

are estimated using only the initial dataset.

In terms of application to a real-world scenario, it

is fair to assume that there would be prior knowledge

concerning the organism being studied, and that the

experiments themselves would have a speciﬁc goal.

In such a scenario we feel that the prior knowledge

could help in the construction of an appropriate prior

structure (a fairly sparse but high-conﬁdence prior

seems to be favourable to ﬂip-edge). Flip-edge would

then be run on time-series data from appropriately de-

signed experiments. We also feel that knowing the

characteristics of the perturbation could help in mak-

ing the method more efﬁcient.

ACKNOWLEDGEMENTS

Ivo Timoteo is supported by an FCT Individual Doc-

toral Fellowship, number SFRH/BD/88466/2012.

LearningDynamicSystemsfromTime-seriesData-AnApplicationtoGeneRegulatoryNetworks

331

REFERENCES

Ackers, G. K., Johnson, A. D., and Shea, M. A.

(1982). Quantitative model for gene regulation by

lambda phage repressor. Proceedings of the National

Academy of Sciences, 79(4):1129–1133.

Banerjee, O., Ghaoui, L., and D’Aspremont, A. (2008).

Model selection through sparse maximum likelihood

estimation for multivariate Gaussian or binary data.

Journal of Machine Learing Research, 9:485–516.

Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Evan-

gelista, C., Kim, I. F., Tomashevsky, M., Marshall,

K. A., Phillippy, K. H., Sherman, P. M., Muertter,

R. N., Holko, M., Ayanbule, O., Yefanov, A., and

Soboleva, A. (2011). NCBI GEO: archive for func-

tional genomics data set - 10 years on. Nucleic Acids

Research, 39:D1005–D1010.

Bento, J., Ibrahimi, M., and Montanari, A. (2010). Learn-

ing networks of stochastic differential equations. In

Lafferty, J. D., Williams, C. K. I., Shawe-Taylor, J.,

Zemel, R. S., and Culotta, A., editors, NIPS, pages

172–180. Curran Associates, Inc.

Bento, J., Ibrahimi, M., and Montanari, A. (2011). Infor-

mation theoretic limits on learning stochastic differ-

ential equations. In Kuleshov, A., Blinovsky, V., and

Ephremides, A., editors, ISIT, pages 855–859. IEEE.

Bonneau, R., Reiss, D. J., Shannon, P., Facciotti, M., Hood,

L., Baliga, N. S., and Thorsson, V. (2006). The infer-

elator: an algorithm for learning parsimonious regula-

tory networks from systems-biology data sets de novo.

Genome Biology, 7(5).

Dream4Challenge (2009). DREAM4, Chal-

lenge 2 – In Silico Network Challenge.

http://wiki.c2b2.columbia.edu/dream/index.php?title=D4c2.

Elidan, G., Ninio, M., Friedman, N., and Schuurmans, D.

(2002). Data perturbation for escaping local maxima

in learning. In In AAAI, pages 132–139.

Gardner, T. S. and Collins, J. J. (2000). Neutralizing noise

in gene networks. Nature, 405(6786).

Grant, M. and Boyd, S. (2008). Graph implementations

for nonsmooth convex programs. In Blondel, V.,

Boyd, S., and Kimura, H., editors, Recent Advances in

Learning and Control, volume 371 of Lecture Notes

in Control and Information Sciences, pages 95–110.

Springer London.

Grant, M. and Boyd, S. (2012). CVX: Matlab software

for disciplined convex programming, version 2.0 beta.

http://cvxr.com/cvx.

Klamt, S., Flassig, R., and Sundmacher, K. (2010). TRAN-

SWESD: inferring cellular networks with transitive

reduction. Bioinformatics, 26(17):2160–2168.

Koller, D. and Friedman, N. (2009). Probabilistic graphical

models: principles and techniques. MIT Press.

Ly, D. L. and Lipson, H. (2012). Learning symbolic rep-

resentations of hybrid dynamical systems. Journal of

Machine Learning Research, 13:3585–3618.

Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Flo-

reano, D., and Stolovitzky, G. (2010). Revealing

strengths and weaknesses of methods for gene net-

work inference. Proceedings of the National Academy

of Sciences.

Marbach, D., Schaffter, T., Mattiussi, C., and Floreano,

D. (2009). Generating realistic in silico gene net-

works for performance assessment of reverse engi-

neering methods. Journal of Computational Biology,

16(2):229–239.

Pinna, A., Soranzo, N., and de la Fuente, A.

(2010). From knockouts to networks: estab-

lishing direct cause-effect relationships through

graph analysis. PLoS ONE 5(10): e12912.

doi:10.1371/journal.pone.0012912.

Prill, R. J., Marbach, D., Saez-Rodriguez, J., Sorger, P. K.,

Alexopoulos, L. G., Xue, X., Clarke, N. D., Altan-

Bonnet, G., and Stolovitzky, G. (2010). Towards a

rigorous assessment of systems biology models: The

DREAM3 challenges. PLoS ONE, 5(2):e9202.

Schaffter, T., Marbach, D., and Floreano, D. (2011).

GeneNetWeaver: in silico benchmark generation and

performance proﬁling of network inference methods.

Bioinformatics, 27(16):2263–2270.

Voortman, M., Dash, D., and Druzdzel, M. (2010). Learn-

ing why things change: The difference-based causal-

ity learner. In Proceedings of the Twenty-Sixth An-

nual Conference on Uncertainty in Artiﬁcial Intelli-

gence (UAI).

Yip, K., Alexander, R., Yan, K., and Gerstein, M.

(2010). Improved reconstruction of In Silico

gene regulatory networks by integrating knockout

and perturbation data. PLoS ONE 5(1): e8121.

doi:10.1371/journal.pone.0008121, 5(1):e8121.

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

332