Using the Ornstein-Uhlenbeck Process for Random Exploration

Johannes Nauta, Yara Khaluf and Pieter Simoens

Department of Information Technology, Ghent University, Ghent, Belgium

Keywords:

Brownian Motion, Exploration, Ornstein-Uhlenbeck Process.

Abstract:

In model-based Reinforcement Learning, an agent aims to learn a transition model between attainable states.

Since the agent initially has zero knowledge of the transition model, it needs to resort to random exploration

in order to learn the model. In this work, we demonstrate how the Ornstein-Uhlenbeck process can be used as

a sampling scheme to generate exploratory Brownian motion in the absence of a transition model. Whereas

current approaches rely on knowledge of the transition model to generate the steps of Brownian motion, the

Ornstein-Uhlenbeck process does not. Additionally, the Ornstein-Uhlenbeck process naturally includes a drift

term originating from a potential function. We show that this potential can be controlled by the agent itself,

and allows executing non-equilibrium behavior such as ballistic motion or local trapping.

1 INTRODUCTION

In autonomous and complex systems, exploration is a

necessary component for learning how to act within

an unknown environment (Wilson et al., 1996). A

widely used approach wherein an agent aims to max-

imize a cumulative task speciﬁc reward is Reinforce-

ment Learning (RL). Speciﬁcally, model-based RL

includes an alternative (or additional) goal where the

reward is not only task speciﬁc but tied to learning the

transition model between attainable states. Once the

model is learnt, the agent can exploit this knowledge

to plan the most rewarding actions given a task, mak-

ing model-based learning approaches display gener-

alization capabilities. Since the agent has no prior

knowledge of the environment, it needs to resort to ex-

ploration to search and learn within the environment

before any exploitation can occur.

Essentially, efﬁcient exploration is a search for

novelty, in which the agent should be steered towards

previously unvisited states. Model-based learning

beneﬁts from efﬁcient exploration because to gener-

ate a global transition model all states within the state

space ﬁrst have to be visited. In model-based RL, the

novelty increases in sparsity as the agent visits more

of the state space. This essentially converts explo-

ration to a problem of sparse target search. Extensive

search for sparse targets via random walks is a widely

studied subject in the ﬁeld of ecology (Viswanathan

et al., 1999; Bartumeus et al., 2005; Ferreira et al.,

2012). However, these approaches execute random

walks based on knowing the transition model. Since

the transition model is absent in novel environments,

these approaches are insufﬁcient for use as an explo-

ration strategy. We therefore aim to use an efﬁcient

exploration strategy that visits many different attain-

able states using random walks that do not require a

transition model.

As a ﬁrst step, we wish to generate Brownian mo-

tion through action sampling. Arguably, while Brow-

nian motion as a search strategy is often outperformed

by other types of random walks, it is efﬁcient in case

of revisitable targets (James et al., 2010) or in the

presence of a bias (Palyulin et al., 2014). Hence,

we consider action-driven Brownian motion as a step-

ping stone for further enhancing exploration in the

absence of a transition model. Random walks re-

sulting from action sampling have been described in

(Lillicrap et al., 2015), where similar to our work

the random walk is considered an exploration pol-

icy. However, their framework is not suitable for

model-learning as well as analytical expressions for

the agents’ movement are missing.

For a model-learning RL agent, executing Brow-

nian motion is non-trivial, because traditionally the

motion is achieved by sampling displacements from

a desired distribution. For example, in two dimen-

sions an angle and step-length are sampled and the

agent moves accordingly. However, when the tran-

sition model is unknown, the agent is unable to de-

termine the action that would result in this target dis-

placement, rendering such sampling procedures ob-

solete. We therefore introduce an action-sampling

framework based on the Ornstein-Uhlenbeck (OU)

Nauta, J., Khaluf, Y. and Simoens, P.

Using the Ornstein-Uhlenbeck Process for Random Exploration.

DOI: 10.5220/0007724500590066

In Proceedings of the 4th International Conference on Complexity, Future Information Systems and Risk (COMPLEXIS 2019), pages 59-66

ISBN: 978-989-758-366-7

process (Uhlenbeck and Ornstein, 1930). The OU

process allows an agent to realize Brownian motion

by sampling actions and without access to a transition

model. After convergence, the OU process evolves

the agents’ velocity according to a Langevin equation

with normally distributed random forces. The fact

that the agent’s velocity follows a Gaussian distribu-

tion, gives rise to the Brownian motion.

Additionally, the OU process naturally encom-

passes a drift term. This drift term can be formu-

lated such that it originates from self-induced poten-

tials by the agent. This allows the agent to change

to non-equilibrium motion, which holds promise for

replicating other types of random walks. Changing

the motion is a necessity for efﬁciency, since opti-

mal search often interchanges local Brownian mo-

tion with long-range displacements (Bartumeus et al.,

2005; Ferreira et al., 2012). Such active Brownian

motion has been studied in biology and physics (Ro-

manczuk et al., 2012), where active Brownian mo-

tion results in out-of-equilibrium motion through self-

propellation (Volpe et al., 2014; Basu et al., 2018).

The paper is organized as follows. We ﬁrst char-

acterize random walks and discuss several metrics in

Section 2. Then, the OU process is described and ana-

lytical expressions for the metrics are obtained in Sec-

tion 3. We additionally introduce the internal drive

and generalize to higher dimensions. The sampling

procedure and numerical details are listed in Sec-

tion 4. Both analytical and empirical results acquired

through simulation are shown in Section 5. The paper

is concluded with a discussion in Section 6.

2 RANDOM WALKS FOR

EXPLORATION

Let us consider an agent as a particle within a Eu-

clidean space. Whereas the ultimate goal of the agent

is to learn a transition model that predicts the next

state given an action, this paper focuses on the frame-

work that generates random walks for exploration.

These need to satisfy a number of requirements. First,

the framework needs to be able to handle continuous

state and action spaces, since physical control tasks

are often continuous in both states and actions. Sec-

ond, in absence of a transition model, the framework

needs to be based on sampling actions. Third, the

framework needs to incorporate an intrinsic drive that

inﬂuences the motion of the agent. The OU process

incorporates all three requirements and is therefore an

excellent choice for use as an exploration strategy.

2.1 Metrics

Random walks are often characterized by several met-

rics, mainly the distribution of step-lengths as well

as the mean squared displacement. The step-length

distribution essentially determines the type of ran-

dom walk. Brownian motion is recovered when the

step-lengths are distributed according to a zero-mean

Gaussian. The mean squared displacement



(t)



the agent is an indication of the size of the explored

regions of the state space. In one dimension, the mean

squared displacement is given by



(t)





(x(t) − x

)



, (1)

where v

the velocity and x

the position at t = 0. The

operator

denotes the ensemble average, which

is computed by averaging over many non-interacting

agents that all start with the same initial velocity v

2.2 Coverage of the State Space

The time evolution of the mean squared displacement

is an indication of the efﬁciency of the coverage of

the random walk. It corresponds to different types of

diffusion, characterized by a power-law exponent γ,

where



(t)



∝ t

, γ > 0 (2)

Normal diffusion has linear scaling of the mean

squared displacement with time, corresponding to

γ = 1. If γ < 1, the agent undergoes subdiffusion

and for γ > 1 superdiffusion arises. Since the mean

squared displacement is an indication of the explored

area of the environment, higher values of γ are gen-

erally preferred when the goal is to increase this cov-

erage. However, one should be careful when mak-

ing conclusions of the random walk based solely on

the time evolution of the mean squared displacement.

For example, ballistic (straight line) motion is easily

regained by simply evolving the agent according to

x = vt, with a constant velocity v. In this case, it is

simple to see that γ = 2. However, intuitively, ballistic

motion does not lead to a homogeneous coverage of

the explored environment and thus γ can give a wrong

indication of the movement of the agent and should

be interpreted with care.

3 THE ORNSTEIN-UHLENBECK

PROCESS

In this section, we introduce the reader to the OU

process that forms the foundation of the sampling

COMPLEXIS 2019 - 4th International Conference on Complexity, Future Information Systems and Risk

(a) Ensemble mean of velocity (b) Ensemble mean of squared velocity (c) Mean squared displacement

Figure 1: Ensemble averages of the ﬁrst (a) and second (b) moments of the velocity for the three-dimensional OU process

with uncorrelated Brownian noise, for different values of θ. Mean square displacements of the same processes are shown in

(c). Note the convergence to γ ≈ 1 when t increases. In this case,~µ = (0,0, 0)

, ~v

= (1,1, 1)

, ~x

= (0,0, 0)

. Black dotted

lines indicate the analytical result.

scheme that is presented in Section 4. The OU pro-

cess is a well-known diffusion process described by a

Langevin equation (Uhlenbeck and Ornstein, 1930).

In one dimension, the position of the agent starting at

is given by

x(t) = x

v(s)ds (3)

The velocity is derived from Newton’s law and has

the following form

dv(t)

= −mθv(t) + F

(x) + B(t), (4)

where θ denotes a friction coefﬁcient, F

(x) the ex-

ternal force and B(t) the stochastic force acting on

the agent with mass m. The ensemble average of the

velocity is given by

v(t)

= v

−θt

+ µ



1 − e

−θt



, µ =

(x)

mθ

(5)

which converges to the drift µ in the time limit t → ∞.

The second moment is given by



(t)



−θt

+ µ



1 − e

−θt

i

2θm



1 − e

−2θt



, (6)

where g is the correlation strength of the stochas-

tic forces, indicating a time-range over which the

stochastic forces are correlated. In the large-time

limit, the second moment equals µ

+ g/2θm

, given

that F

(x) = F

a constant. In contrast with the ﬁrst

moment, this includes a dependency on the friction

coefﬁcient θ in the large-time limit, generating an off-

set due to friction.

Next we wish to determine the mean squared dis-

placement of the ensemble in the presence of a con-

stant external force



(t)





− µ



1 − e

−θt



+ µt





t +

2θ



−θt

− e

−2θt

− 3





, (7)

which in the large-time limit equals

lim

t→∞



(t)





− µ

+ µt





t −

2θ



(8)

When the drift equals 0 we obtain the famous result

of Einstein, namely that the mean squared displace-

ment scales linearly with time, i.e.



(t)



∝ t

(Einstein, 1905). When the drift is non-zero, we ob-

tain



(t)



∝ t

. Furthermore, v(t) is normally

distributed when t → ∞, with mean µ and variance



(t)



−

v(t)

. As a result, displacements of

the position x(t) are normally distributed, effectively

replicating Brownian motion when µ = 0.

3.1 Origin of Internal Drift

In physics, the drift µ is typically regarded as an ex-

trinsic effect, uncontrollable by the agent. However,

it is important to note that the sampling of the forces

(and thus the velocities) is performed by the agent

itself. Thus, applying an external drift to a passive

agent is the same as applying an internal drift to an

active agent. As motivated in the introduction, this in-

ternal drift can act similar to a curiosity signal (Hafez

et al., 2017), where the agent can undergo a drastic

transition or stay close to its current location depend-

ing on its intentions.

Using the Ornstein-Uhlenbeck Process for Random Exploration

Figure 2: Velocity distributions for the one-dimensional OU process at T = n∆t, for different values of θ, µ = 0, v

= 1,m = 1.

Note that the distribution is Gaussian with mean

v(T )

= µ and variance



(T )



−

v(T )

2θm

Analytically, F

can be written as the derivative

of a potential U(x,t), giving us the Langevin equation

for the velocity as

dv(t)

= −mθv(t) −

U(x,t) + B(t), (9)

where the minus sign appears through the resulting in-

trinsic force F

= −dU/dx. This equation indicates

that the drift can be both position-dependent and time-

dependent. Note that the performed analysis to com-

pute the ensemble averages does only holds when the

potential is position-dependent, not time-dependent.

Since the velocity converges to µ = F

/mθ in the

large-time limit (see Eq. (5)), an agent can thus ac-

tively induce non-equilibrium behavior by manipulat-

ing the self-induced potential. For instance, an ap-

propriate choice of potential will allow the agent to

“trap” itself in a region or conversely to realize bal-

listic (straight line) motion. In the remainder of this

paper, we shall only consider position dependent po-

tentials.

3.2 Generalization to Multiple

Dimensions

In RL, action spaces are often high-dimensional. For

example, a robotic arm can have multiple joints on

which forces can be exerted. Therefore, if we wish to

enable model-based learning for agents with many de-

grees of freedom, we need to extend the OU process

to higher dimensions. This generalization is straight-

forward by considering the n-dimensional Langevin

equation

~x(t) =~x

ds, (10)

d~v(t)

= −mθ~v(t) +

B(t), (11)

Solving the Langevin equation and computing the en-

semble average again gives

~v(t)

=~v

−θt

+~µ(1 − e

−θt

), ~µ =

mθ

(12)

When t → ∞, the ensemble average of the veloc-

ity again converges to the external drift ~µ since

for each dimension holds lim

t→∞

(t)

= µ

The mean squared displacement depends on the

correlation matrix Σ

v(t)

. The random forces

are considered uncorrelated if the noise vector

B(t) = (B

(t),B

(t),. .. B

(t)) is an n-dimensional

vector consisting of independent Wiener processes

(Ibe, 2013), i.e.



(t)B

)



= gδ

i j

δ(t −t

), (13)

where δ

i j

is the Kronecker delta and δ(t −t

) the Dirac

delta function. The covariance matrix of the velocity

is given by

v(t)

i j

gδ

i j

2θm



1 − e

−2θt



, (14)

which is a diagonal matrix. Thus our n-dimensional

velocity is distributed according to a multivariate

Gaussian distribution with mean ~µ and diagonal co-

variance matrix Σ

v(t)

. For computing the mean

squared displacement we deﬁne ~r(t) = ~x(t) −~x

which gives

(t) = |~r(t)|

∑

i=1

(t), (15)

which is simply the sum of the squared displacements

in each dimension (i.e. the square of the Euclidean

distance between~x(t) and~x

). Substituting the veloc-

ity of Eq. (11) and squaring we have



(t)



∑

i=1



− µ



1 − e

−θt



+ µ





t +

2θ

−θt

− e

−2θt

− 3



, (16)

COMPLEXIS 2019 - 4th International Conference on Complexity, Future Information Systems and Risk

which is the sum of the mean squared displacement

in all dimensions, where each dimension has the same

expression as Eq. (7). This result is expected, since

we deﬁned the n-dimensional noise vector having in-

dependent components. Therefore, each dimension is

undergoing Brownian motion, with the mean squared

displacement being the sum of displacements in each

dimension. Brownian motion is often modelled us-

ing correlated noise (i.e. non-diagonal elements of

the covariance matrix are non-zero), however careful

analysis of such systems are considered future work.

4 SAMPLING PROCEDURE

To realize Brownian motion in the state space S at

discretized points in time with time-steps ∆t, an agent

with mass m = 1 will sample forces ξ from a normal

distribution ξ with mean 0 and standard deviation ∆t,

N (0,∆t). This results in a time-discretization of Eq.

(10) and Eq. (11) of the velocity and position in each

dimension i:

t+∆t,i

= v

t,i

(1 − θ∆t) + µ

+ ξ

t,i

, (17)

t+∆t,i

= x

t,i

+ v

t+∆t,i

∆t (18)

Setting the standard deviation of ξ

t,i

to ∆t also de-

ﬁnes the correlation strength of stochastic driving

force, namely



(t)



= g = ∆t. This means that

the stochastic forces are only correlated within a time

range ∆t, deﬁning a Wiener process. Ensemble aver-

ages of the (squared) velocity and the mean squared

displacement are computed over an ensemble of N =

1000 non-interacting agents, unless stated otherwise.

Since our interest lies mostly in the large-time limit,

we evolve Eq. (17) and Eq. (18) for n = 10

steps,

where ∆t = 0.01. In multiple dimensions, we com-

pute the ensemble average of the absolute value of the

velocity, i.e. the length of the velocity vector,

|~v(t)| =

∑

i=1

(t), |~v

(t)| =

∑

i=1

(t), (19)

where D is the number of dimensions. The power-law

exponent of the mean squared displacement can be

numerically approximated by discretizing time with

increments s:

γ(t) ≈

log





(t + s)





− log





(t)





log(t + s) − log(t)

, (20)

where log(·) the natural logarithm.

Figure 3: Power-law exponent γ of the three-dimensional

OU process with uncorrelated Brownian noise for µ = 0,

= 1, x

= 0. For large t, γ ≈ 1, indicating standard dif-

fusion, i.e. Brownian motion. Black dotted lines indicate

the analytical result. Note that at t → ∞, γ → 1 regardless

of θ. Large deviations are due to numerical computation of

the derivative.

5 RESULTS

We verify the analytical results and build upon the re-

sults to display active particle steering is able to ef-

fectively guide the agent to undergo non-equilibrium

behavior. In Section 5.1, we ﬁrst validate whether we

are indeed able to realize Brownian motion through

action sampling. In Section 5.2, we show an agent

can change the distribution of the state space explo-

ration trajectory by applying different potentials.

5.1 Brownian Motion

We investigate the numerical simulations of the three-

dimensional OU process in the absence of a drift term

(µ = 0), described through Eqs. (17) and (18). We aim

to verify that each agent is indeed undergoing Brown-

ian motion through measuring the metrics mentioned

in Section 2.

5.1.1 Velocity Distribution

The velocity distribution is deﬁned through the ﬁrst

and second moment. The results for both moments

are displayed in Figs. 1a, 1b. Excellent agreement

between the analytical and numerical solutions is ob-

served. For large t, the ﬁrst moment of the veloc-

ity tends towards the extrinsic drift µ = 0, whereas

the second moment converges to a value that depends

on θ. For both moments, the strength θ determines

Using the Ornstein-Uhlenbeck Process for Random Exploration

(a) (b)

Figure 4: Sample trajectories of the one-dimensional Brownian particle in different potentials for different strengths of the

OU process, for n = 10

steps, ∆t = 0.01. The arrows indicate the potential U(x). (a) U(x) = − ax, for x

= 0, a = 0.01,

(b) U(x) = ax

, for x

= −10, a = 0.01.

the convergence time of the process through a typi-

cal time τ = θ

−1

. In the large time limit, the velocity

is Gaussian (Uhlenbeck and Ornstein, 1930) around

µ = 0. This is indicated in Fig. 2, where the empirical

velocity at T = n∆t is indeed Gaussian. This means

that, for large t, the agent is sampling from a (multi-

variate) Gaussian.

5.1.2 Mean Squared Displacement

The mean squared displacement is plotted in Fig. 1c.

The analytical and numerical results are again in ex-

cellent agreement. The power-law coefﬁcient of the

mean squared displacement, calculated by means of

Eq. (20), is plotted in Fig. 3. Note the transition be-

tween super-linear scaling (γ > 1) and linear scaling

(γ = 1) around the typical time τ = θ

−1

. Furthermore,

for large θ, there exists a time window of subdiffusion

(γ < 1) arising from the slowing down of the agents

due to a large friction coefﬁcient θ.

5.2 Active Motion

Next we shall consider two elementary potential func-

tions U (x) to demonstrate how an agent can induce

other types of (non-equilibrium) motion. Ballistic

motion is able to induce large displacements, desired

if the agent wishes to visit far away regions of the state

space. In contrast, trapping the agent around a cer-

tain state enables local exploration where movement

is bound to a small area. Combining large displace-

ments and local trapping can give rise to continuous

time random walks (Volpe and Volpe, 2017). Addi-

tionally, the combination of these two potentials may

enable replication of many different continuous time

random walks, as these are often a combination of lo-

cal Brownian motion and long, correlated movement

(Zaburdaev et al., 2015). For visual purposes, we have

chosen to illustrate all results in one dimension, how-

ever generalization to multiple dimensions is trivial

(see Section 4).

5.2.1 Linear Potential

Let us consider a linear potential U = −ax such that

dv(t)

= −θv(t) + a + B(t) (21)

A sample trajectory is shown in Fig. 4a. For lower

values of θ, the agent experiences less friction and

thus larger deviations are observed. This indicates the

trivial result that a frictionless particle exhibits higher

displacements within the same time. In the large-time

limit, we know that the ensemble mean of the veloc-

ity converges to the drift µ = a. The result is that the

Brownian particle will move with a close to constant

velocity when time increases (see Figs. 6a, 6b), re-

sulting in the visible straight line displacement with

respect to time as seen in Fig. 4. Thus, by using a lin-

ear potential, we are able to actively steer the agent to

undergo ballistic motion corresponding to x = vt with

a constant velocity µ = a/θ in the large time limit. As

shown in Fig. 5a, the mean squared displacement in-

deed evolves according to γ ≈ 2 for all θ when t is

large.

COMPLEXIS 2019 - 4th International Conference on Complexity, Future Information Systems and Risk

(a) Linear potential (b) Quadratic potential

Figure 5: Mean squared displacement of the OU process in different potentials, for different values of θ. Here, v

= 0, a = 0.01. (a) Given a linear potential U = −ax, we note that the mean squared displacement scales according to γ ≈ 2.

(b) Indicates trapping of the particle for U = ax

(γ = 0) where θ inﬂuences the average displacement at which the particle

becomes trapped.

5.2.2 Quadratic Potential

Let us consider a quadratic potential U = ax

such

that

dv(t)

= −θv(t) − ax + B(t) (22)

By applying this potential, we expect that the agent

can trap itself close to the minimum of this poten-

tial at x = 0, with θ again describing the inﬂuence of

friction. High friction results in fast trapping of the

agent, whereas low friction indeed displays ﬂuctua-

tions around the minimum of the potential with sig-

niﬁcantly larger times necessary before effective trap-

ping occurs (see Fig. 4). For small θ, the agent is able

to drift further away from the minimum of the po-

tential due to overshooting the minimum. The mean

squared displacement is plotted in Fig. 5 for different

values of θ. After some time, the agent indeed be-

comes trapped, indicated by a stagnation of the mean

squared displacement and the mean velocity (see Figs.

5b, 6c). The friction coefﬁcient of the OU process en-

codes deviations from the mean, meaning that small

values of θ indicate a higher variance for the velocity

distribution, as indicated in Fig. 6d. This induces, on

average, further displacements from the minimum of

the potential at x = 0. The ﬁrst and second moments

of the velocity converge to different values for differ-

ent θ.

(a) (b)

Figure 6: Ensemble average of ﬁrst and second moments of

the velocity for the one-dimensional OU process in different

potentials, for different values of θ. Colours corresponding

to different θ are shown in the legend in (a). In all cases,

, x

= 0, a = 0.01. a) + (b): Linear potential U =

−ax. (c) + (d): Quadratic potential, U = ax

6 CONCLUSION

Learning a state transition model is a prerequisite of

any model-based RL control paradigm. To learn such

a model efﬁciently, an agent must efﬁciently explore

the state space. In this paper, we presented an ap-

Using the Ornstein-Uhlenbeck Process for Random Exploration

proach that is based on the OU process to realize ac-

tive guidance of an agent through state space by sam-

pling velocities instead of displacements. We have as-

sumed zero knowledge of the transition model, since

this is generally the case in most model-based RL set-

tings. The OU process evolves the agents’ velocity

according to a Langevin equation, where in the large-

time limit the sampled velocities follow a Gaussian

distribution. Additionally, the model allows the agent

to inﬂuence the action sampling scheme (and thus its

motion pattern) by means of a self-induced potential

function. One key advantage of our approach is that

we can derive closed-form analytical expressions.

In this paper, we have assumed that the transi-

tion model remains unknown, even after the agent has

explored the environment for some time. However,

when model-based learning is considered, the agent

often builds its knowledge in a incremental, iterative

fashion. In order to account for this, in future work we

will study the effects of making the strength θ time-

dependent as well as changing the intrinsic drift term

µ in reaction to encountered novelty. This generates a

framework wherein the intrinsic drive originates from

extrinsic sources or observations, resembling an intu-

itive implementation of a curious agent.

Furthermore, acquiring similar analytical expres-

sions for different types of random walks is highly de-

sirable. In particular, we wish to focus on a L

evy walk

(Zaburdaev et al., 2015). In a L

evy walk, the displace-

ments are sampled from a power law, interchang-

ing local displacements with long time-correlated dis-

placements within the environment. Using different

potentials, one can most likely replicate L

evy-like be-

havior through the process described in this work.

One could alternatively use a different formulation

of the underlying noise scheme, i.e. sample directly

from the desired distribution. This possibly give rise

to L

evy walks and might further enhance exploration

of an environment (Bartumeus et al., 2005; Ferreira

et al., 2012).

This work indicates a stepping stone in simulat-

ing random walks for exploration. Enabling random

walks in the absence of a transition model might

prove beneﬁcial for model-based RL, even opening

the doors to more efﬁcient sampling schemes that im-

prove learning in continuous state spaces.

REFERENCES

Bartumeus, F., da Luz, M. G. E., Viswanathan, G. M., and

Catalan, J. (2005). Animal search strategies: a quan-

titative random-walk analysis. Ecology, 86(11):3078–

3087.

Basu, U., Majumdar, S. N., Rosso, A., and Schehr, G.

(2018). Active brownian motion in two dimensions.

arXiv preprint arXiv:1804.09027.

Einstein, A. (1905). Investigations on the theory of the

brownian movement. Ann. der Physik.

Ferreira, A., Raposo, E., Viswanathan, G., and da Luz, M.

(2012). The inﬂuence of the environment on l

evy ran-

dom search efﬁciency: Fractality and memory effects.

Physica A: Statistical Mechanics and its Applications,

391(11):3234 – 3246.

Hafez, M. B., Weber, C., and Wermter, S. (2017). Curiosity-

driven exploration enhances motor skills of continu-

ous actor-critic learner. In 2017 Joint IEEE Interna-

tional Conference on Development and Learning and

Epigenetic Robotics (ICDL-EpiRob), pages 39–46.

Ibe, O. C. (2013). Elements of Random Walk and Diffusion

Processes. Wiley Publishing, 1st edition.

James, A., Pitchford, J. W., and Plank, M. (2010). Efﬁcient

or inaccurate? analytical and numerical modelling of

random search strategies. Bulletin of Mathematical

Biology, 72(4):896–913.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,

Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-

uous control with deep reinforcement learning. CoRR,

abs/1509.02971.

Palyulin, V. V., Chechkin, A. V., and Metzler, R. (2014).

evy ﬂights do not always optimize random blind

search for sparse targets. Proceedings of the National

Academy of Sciences, 111(8):2931–2936.

Romanczuk, P., B

ar, M., Ebeling, W., Lindner, B., and

Schimansky-Geier, L. (2012). Active brownian par-

ticles. The European Physical Journal Special Topics,

202(1):1–162.

Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory

of the brownian motion. Phys. Rev., 36:823–841.

Viswanathan, G. M., Buldyrev, S. V., Havlin, S., Da Luz,

M., Raposo, E., and Stanley, H. E. (1999). Op-

timizing the success of random searches. Nature,

401(6756):911.

Volpe, G., Gigan, S., and Volpe, G. (2014). Simulation

of the active brownian motion of a microswimmer.

American Journal of Physics, 82(7):659–664.

Volpe, G. and Volpe, G. (2017). The topography of the en-

vironment alters the optimal search strategy for active

particles. Proceedings of the National Academy of

Sciences, 114(43):11350–11355.

Wilson, S. W. et al. (1996). Explore/exploit strategies in

autonomy. In Proc. of the Fourth International Con-

ference on Simulation of Adaptive Behavior: From

Animals to Animats, volume 4, pages 325–332.

Zaburdaev, V., Denisov, S., and Klafter, J. (2015). L

evy

walks. Reviews of Modern Physics, 87(2):483.

COMPLEXIS 2019 - 4th International Conference on Complexity, Future Information Systems and Risk