Bayesian Quadrature in Nonlinear Filtering

Jakub Prüher and Miroslav Šimandl

European Centre of Excellence - New Technologies for the Information Society,

Faculty of Applied Sciences, University of West Bohemia, Univerzitní 18, Pilsen, Czech Republic

Keywords:

Nonlinear Filtering, Bayesian Quadrature, Gaussian Process.

Abstract:

The paper deals with the state estimation of nonlinear stochastic discrete-time systems by means of quadrature-

based ﬁltering algorithms. The algorithms use quadrature to approximate the moments given by integrals. The

aim is at evaluation of the integral by Bayesian quadrature. The Bayesian quadrature perceives the integral

itself as a random variable, on which inference is to be performed by conditioning on the function evaluations.

Advantage of this approach is that in addition to the value of the integral, the variance of the integral is

also obtained. In this paper, we improve estimation of covariances in quadrature-based ﬁltering algorithms

by taking into account the integral variance. The proposed modiﬁcations are applied to the Gauss-Hermite

Kalman ﬁlter and the unscented Kalman ﬁlter algorithms. Finally, the performance of the modiﬁed ﬁlters is

compared with the unmodiﬁed versions in numerical simulations. The modiﬁed versions of the ﬁlters exhibit

signiﬁcantly improved estimate credibility and a comparable root-mean-square error.

1 INTRODUCTION

Dynamic systems are widely used to model behaviour

of real processes throughout the sciences. In many

cases, it is useful to deﬁne a state of the system

and consequently work with a state-space represen-

tation of the dynamics. When the dynamics exhibits

stochasticity or can only be observed indirectly, the

problem of state estimation becomes relevant. Es-

timating a state of the dynamic system from noisy

measurements is a prevalent problem in many appli-

cation areas such as aircraft guidance, GPS navigation

(Grewal et al., 2007), weather forecast (Gillijns et al.,

2006), telecommunications (Jiang et al., 2003) and

time series analysis (Bhar, 2010). When the state es-

timator is required to produce an estimate using only

the present and past measurements, this is known as

the ﬁltering problem.

For a discrete-time linear Gaussian systems, the

best estimator in the mean-square-error sense is the

much-celebrated Kalman ﬁlter (KF) (Kalman, 1960).

First attempts to deal with the estimation of nonlinear

dynamics can be traced to the work of (Smith et al.,

1962), which resulted in the extended Kalman ﬁlter

(EKF). The EKF algorithm uses the Taylor series ex-

pansion to approximate the nonlinearities in the sys-

tem description. A disadvantage of the Taylor series

is that it requires diﬀerentiability of the approximated

functions. This prompted further development (Nør-

gaard et al., 2000; Šimandl and Duník, 2009) result-

ing in the derivative-free ﬁlters based on the Stirling’s

interpolation formula. Other approaches that approx-

imate nonlinearities include the Fourier-Hermite KF

(Sarmavuori and Särkkä, 2012), special case of which

is the statistically linearized ﬁlter (Maybeck, 1982;

Gelb, 1974).

Instead of explicitly dealing with nonlinearities

in the system description, the unscented Kalman ﬁl-

ter (UKF) (Julier et al., 2000) describes the densities

by a ﬁnite set of deterministically chosen σ-points,

which are then propagated through the nonlinearity.

Other ﬁlters, such as the Gauss-Hermite Kalman ﬁlter

(GHKF) (Ito and Xiong, 2000), the cubature Kalman

ﬁlter (CKF) (Arasaratnam and Haykin, 2009) and the

stochastic integration ﬁlter (Duník et al., 2013), uti-

lize numerical quadrature rules to approximate mo-

ments of the relevant densities. These ﬁlters can

be seen as representatives of a more general σ-point

methodology.

A limitation of classical integral approximations,

such as the Gauss-Hermite quadrature (GHQ), is that

they are speciﬁcally designed to perform with zero

error on a narrow class of functions (typically poly-

nomials up to a given degree (Särkkä, 2013)). It is

also possible to design rules, that have best average-

case performance on a wider range of functions at

the cost of permitting small non-zero error (Minka,

2000). In recent years, the Bayesian quadrature (BQ)

380

Prüher J. and Šimandl M..

Bayesian Quadrature in Nonlinear Filtering.

DOI: 10.5220/0005534003800387

In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2015), pages 380-387

ISBN: 978-989-758-122-9

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

has become a focus of interest in probabilistic numer-

ics community (Osborne et al., 2012). The BQ treats

numerical integration as a problem of Bayesian infer-

ence and thus it is able to provide an additional in-

formation - namely, uncertainty in the computation of

the integral itself. In (Särkkä et al., 2014), the authors

work with the concept of BQ, but the algorithms de-

rived therein do not make use of the uncertainty in the

integral computations.

The goal of this paper is to augment the current

σ-point algorithms so that the uncertainty associated

with the integral approximations is also reﬂected in

their estimates.

The rest of the paper is organized as follows. For-

mal deﬁnition of the Gaussian ﬁltering problem is

outlined in Section 2, followed by the exposition of

the basic idea of Bayesian quadrature in Section 3.

The main contribution, which is the design of the

Bayes-Hermite Kalman ﬁlter (BHKF), is presented in

Section 4. Finally, comparison of the BHKF with ex-

isting ﬁlters is made in Section 5.

2 PROBLEM FORMULATION

The discrete-time stochastic dynamic system is de-

scribed by the following state-space model

= f(x

k−1

) + q

k−1

, q

k−1

∼ N(0, Q), (1)

= h(x

) + r

, r

∼ N(0, R), (2)

with initial conditions x

∼ N(m

, P

), where x

∈ R

is the system state evolving according to the known

nonlinear dynamics f : R

→ R

perturbed by the

white state noise w

k−1

∈ R

. Measurement z

∈ R

a result of applying known nonlinear transformation

h : R

→ R

to the system state and white addi-

tive measurement noise r

∈ R

. The mutual inde-

pendence is assumed between the state noise w

, the

measurement noise r

and the system initial condition

for all k ≥ 1.

The ﬁltering problem is concerned with determi-

nation of the probability density function p(x

| z

1:k

The shorthand z

1:k

stands for the sequence of mea-

surements z

, z

, . . . , z

. The general solution to the

ﬁltering problem is given by the Bayesian recursive

relations in the form of density functions

p(x

1:k

) =

p(z

) p(x

1:k−1

)

p(z

1:k−1

)

, (3)

with predictive density p(x

| z

1:k−1

) given by the

Chapman-Kolmogorov equation

p(x

1:k−1

) =

p(x

k−1

)p(x

k−1

1:k−1

) dx

k−1

(4)

In this paper, the integral computation is assumed

to take place over the support of x

k−1

. The like-

lihood term p(z

| x

) in (3) is determined by the

measurement model (2) and the transition probability

p(x

k−1

) in (4) by the dynamics model (1).

For tractability reasons, the Gaussian ﬁlters make

simplifying assumption, that the joint density of state

and measurement p(x

, z

| z

1:k−1

) is of the form













k|k−1















k|k−1













k|k−1













. (5)

Knowledge of the moments in (5) is fully suﬃcient

(Deisenroth and Ohlsson, 2011) to express the ﬁrst

two moments, m

k|k

and P

k|k

, of the conditional density

p(x

| z

1:k

) using the conditioning formula for Gaus-

sians as

k|k

= m

k|k−1

+ K



− m

k|k−1



, (6)

k|k

= P

k|k−1

− K

k|k−1

, (7)

with the Kalman gain deﬁned as

= P

k|k−1



k|k−1



−1

The problem of computing the moments in (5) can

be seen, on a general level, as a computation of mo-

ments of a transformed random variable

y = g(x), (8)

where g is a nonlinear vector function. This invari-

ably entails evaluation of the integrals of the follow-

ing kind

E[y] =

g(x)p(x) dx (9)

with Gaussian p(x). Since the integral is typically in-

tractable, σ-point algorithms resort to the approxima-

tions based on weighted sum of function evaluations

g(x)p(x) dx ≈

i=1

g(x

(i)

). (10)

The evaluation points x

(i)

are also known as the σ-

points, hence the name.

Thus, for instance, to compute m

k|k−1

, P

k|k−1

and

k|k−1

, the following expressions, given in the matrix

notation, are used

k|k−1

' F

w, (11)

k|k−1

F, (12)

k|k−1

F, (13)

where the weights are now w =

[

, . . . , w

]

W = diag

(

, . . . , w

]

)

and the i-th rows of F,

F and

X are deﬁned as the transpose of f(x

(i)

k−1

f(x

(i)

k−1

) − m

k|k−1

and x

(i)

k−1

− m

k|k−1

respectively.

BayesianQuadratureinNonlinearFiltering

381

All the information a quadrature rule has about

the function behaviour is conveyed by the N func-

tion values g(x

(i)

). Conversely, this means that any

quadrature is uncertain about the true function values

in between the σ-points. The importance of quantify-

ing this uncertainty becomes particularly pronounced,

when the function is not integrated exactly due to the

inherent design limitations of the quadrature (such as

the choice of weights and σ-points). All σ-point ﬁl-

ters thus operate with the uncertainty, which is not

accounted for in their estimates. The classical treat-

ment of the quadrature does not lend itself nicely to

the quantiﬁcation of the uncertainty associated with a

given rule. On the other hand, the Bayesian quadra-

ture, which treats the integral approximation as a

problem in Bayesian inference, is perfectly suited for

this task.

The Idea of using Bayesian quadrature in the state

estimation algorithms was already treated in (Särkkä

et al., 2014). The derived ﬁlters and smoothers, how-

ever, do not fully utilize the potential of the Bayesian

quadrature. Namely, variance of the integral is not re-

ﬂected in their estimates, which still remains a prob-

lem to this day.

3 GAUSSIAN PROCESS PRIORS

AND BAYESIAN QUADRATURE

In this section, we introduce the key concepts of

Gaussian process priors and Bayesian quadrature,

which are crucial to the derivation of the ﬁltering al-

gorithm in Section 4.

3.1 Gaussian Process Priors

Uncertainty over functions is naturally expressed by

a stochastic process. In Bayesian quadrature, Gaus-

sian processes (GP) are used for their favourable an-

alytical properties. Gaussian process is a collection

of random variables indexed by elements of an index

set, any ﬁnite number of which has a joint Gaussian

density (Rasmussen and Williams, 2006). That is, for

any ﬁnite set of indices X

, x

, . . . , x

, it holds

that

( g(x

), g(x

), . . . , g(x

) )

∼ N(0, K), (14)

where the kernel (covariance) matrix K is made up

of pair-wise evaluations of the kernel function, thus

[

]

i j

= k(x

, x

). Choosing a kernel, which in prin-

ciple can be any symmetric positive deﬁnite function

of two arguments, introduces assumptions about the

underlying function we are trying to model. Bayesian

inference allows to combine the GP prior p(g) with

the data, D =

{

(

, g(x

)

, i = 1, . . . , N

}

comprising

the evaluation points X = [x

, . . . , x

] and the func-

tion evaluations y



g(x

), . . . , g(x

)



, to pro-

duce a GP posterior p(g | D) with moments given by

(Rasmussen and Williams, 2006)

[g(x

)] = m

) = k

−1

, (15)

[g(x

)] = σ

) = k(x

, x

) − k

−1

k(x

(16)

where k(x

) = [k(x

, x

), . . . , k(x

, x

)]

. Thus, for

any test input x

, we recover a Gaussian posterior pre-

dictive density over the function values g(x

). Fig-

ure 1 depicts predictive moments of the GP poste-

rior density. Notice, that in between the evaluations,

where the true function value is not known, the GP

model is uncertain.

g(x

)

g(x

)

g(x

)

Figure 1: True function (dashed), GP posterior mean

(solid), observed function values (dots) and GP posterior

samples (grey). The shaded area represents GP posterior

predictive uncertainty (±2 σ

)). Notice the collapse of

uncertainty around the observations.

3.2 Bayesian Quadrature

The problem of numerical quadrature pertains to the

approximate computation of the integral

[g(x)] =

g(x)p(x) dx. (17)

The key distinguishing feature of the BQ is that it

"treats the problem of numerical integration as the

one of statistical inference." (O’Hagan, 1991) This is

achieved by placing a prior density over the integrated

functions themselves. Consequence of this is that the

integral itself is then a random variable as well. Con-

cretely, if GP prior density is used, then the value of

the integral of the function will also be Gaussian dis-

tributed. This follows from the fact that integral is a

linear operator acting on the GP distributed random

function g(x).

Following the line of thought of (Rasmussen and

Ghahramani, 2003) we take expectation (with respect

ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics

382

to p(g |D)) of the integral (17) and obtain

g|D

[g(x)]] =

g(x)p(x) dx p(g|D) dg

g(x)p(g |D) dgp(x) dx = E

g|D

[g(x)]]. (18)

From (18) we see, that taking the expectation of in-

tegral is the same as integrating the GP posterior

mean function, which eﬀectively approximates the in-

tegrated function g(x). Variance of the integral is

g|D

[g(x)]] =

k(x, x

)p(x)p(x

) dx dx

. (19)

A popular choice of kernel function, that enables the

expressions (18) and (19) to be computed analytically

is an Exponentiated Quadratic (EQ)

k(x

, x

; θ) = α

exp

−



− x



−1



− x



(20)

where the vertical lengthscale α and the horizontal

lengthscales on diagonal of Λ = diag( [`

, . . . , `

] )

are kernel hyper-parameters, collectively denoted by

the symbol θ. By using this particular kernel the as-

sumption of smoothness (inﬁnite diﬀerentiability) of

the integrand is introduced (Rasmussen and Williams,

2006). Given the kernel function in the form (20) and

p(x) = N(m, P), the expressions for the integral pos-

terior mean and variance reduce to

g|D

[g(x)]] = l

−1

, (21)

g|D

[g(x)]] = α



2Λ

−1

P + I



−1/2

− l

−1

l, (22)

with l = [l

, . . . , l

]

k(x, x

) N(x | m, P) dx = α



−1

P + I



−1/2

× exp

−

− m)

(Λ + P)

−1

− m)

. (23)

Notice that we could deﬁne weights as w = l

−1

Then the expression (21) is just a weighted sum of

function evaluations, conforming to the general σ-

point method as described by (11). As opposed to

classical quadrature rules, that prescribe the precise

locations of σ-points, BQ makes no such restrictions.

In (Minka, 2000), the optimal placement is deter-

mined by minimizing the posterior variance of the in-

tegral (19).

In the next section, we show how the integral vari-

ance (19) can be reﬂected in the current nonlinear ﬁl-

tering quadrature-based algorithms.

4 BAYES-HERMITE KALMAN

FILTER

In this section, we show how the integral variance

can be incorporated into the moment estimates of

the transformed random variable. Parallels are drawn

with existing GP-based ﬁlters and the Bayes-Hermite

Kalman ﬁlter algorithm is outlined.

4.1 Incorporating Integral Uncertainty

Uncertainty over the function values is introduced by

a GP posterior p(g | D), whose mean function (15)

acts eﬀectively as an approximation to the determin-

istic function g. Note that the equations (15), (16) can

only be used to model single output dimension of the

vector function g. For now, we will assume a scalar

function g unless otherwise stated. To keep the nota-

tion uncluttered, conditioning on D will be omitted.

Treating the function values g(x) as random leads to

the joint density p(g, x) and thus, when computing the

moments of g(x), the expectations need to be taken

with respect to both variables. This results in the fol-

lowing approximation of the true moments

µ = E

[g(x)] ≈ E

g,x

[g(x)], (24)

= V

[g(x)] ≈ V

g,x

[g(x)]. (25)

Using the law of iterated expectations, we get

g,x

[g(x)] = E

[g(x)]] = E

[g(x)]]. (26)

This fact was used to derive weights for the ﬁltering

and smoothing algorithms in (Särkkä et al., 2014),

where the same weights were used in computations

of means and covariances. Our proposed approach,

however, proceeds diﬀerently in derivation of weights

used in the computation of covariance matrices.

Note that the term for variance can be written out

using the decomposition formula either as

g,x

[g(x)] = E

[g(x)]] + V

[g(x)]] (27)

or as

g,x

[g(x)] = E

[g(x)]] + V

[g(x)]] (28)

depending on which factorization of the joint density

p(g, x) is used. The terms V

[g(x)] and V

[g(x)]]

can be identiﬁed as variance of the integrand and vari-

ance of the integral respectively. In case of determin-

istic g, both of these terms are zero.

With EQ covariance (20), the expression (26) for

the ﬁrst moment of a transformed random variable

takes on the form (21). Since the variance decom-

positions in (27) and (28) are equivalent, both can be

used to achieve the same goal.

BayesianQuadratureinNonlinearFiltering

383

The form (27) was utilized in derivation of the

Gaussian process - assumed density ﬁlter (GP-ADF)

(Deisenroth et al., 2012), which relies on the solution

to the problem of prediction with GPs at uncertain in-

puts (Girard et al., 2003). So, even though these re-

sults were derived to solve a seemingly diﬀerent prob-

lem, we point out, that by using the form (27), the

uncertainty of the integral (as seen in the last term

of (28)) is implicitly reﬂected in the resulting covari-

ance. To conserve space, we only provide a summary

of the results in (Deisenroth et al., 2009) and point

reader to the said reference for detailed derivations.

The expressions for the moments of transformed vari-

able were rewritten into a form, which assumes that a

single GP is used to model all the output dimensions

of the vector function (8)

µ = G

w, (29)

Σ = G

WG − µµ

+ diag



− tr



−1



, (30)

with matrix G being deﬁned analogously to F in (11)-

(13). The weights are given as

w = K

−1

l and W = K

−1

, (31)

where

L =

k(X, x; θ

) k(x, X; θ

) N(x|m, P) dx. (32)

The equations (29) and (30) bear certain resemblance

to the σ-point method in (11), (12); however, in this

case matrix W is not diagonal. Note, that the weights

depend on the current location of σ-points and need

to be recomputed at every time step.

4.2 BHKF Algorithm

The ﬁltering algorithm based on the BQ can now be

constructed utilizing (29) and (30). The BHKF uses

two GPs with the EQ covariance - one for each func-

tion in the state-space model (1)-(2), which means

that the two sets of hyper-parameters are used; θ

and

. In the algorithm speciﬁcation below, the lower in-

dex of q and K speciﬁes the set of hyper-parameters

used to construct these quantities.

Algorithm (Bayes-Hermite Kalman Filter). In the

following, let x

0|0

∼ N



0|0

, P

0|0



, i = 1, . . . , N and

k = 1, 2, . . ..

Initialization:

Choose unit σ-points ξ

(i)

. Set hyper-parameters θ

and θ

. Proceed from the initial conditions x

0|0

, for

all k, by alternating between the following prediction

and ﬁltering steps.

Prediction:

1. Form the σ-points x

(i)

k−1

= m

k−1|k−1

(i)

2. Propagate σ-points through the dynamics model

(i)

= f



(i)

k−1



and form F as in (11)-(13).

3. Using x

(i)

and hyper-parameters θ

, compute

weights w

and W

according to (31) and (32).

4. Compute predictive mean m

k|k−1

and predictive

covariance P

k|k−1

= F

k|k−1

= F

F − m

k|k−1

)

+ diag



− tr



−1



+ Q

Filtering:

1. Form the σ-points x

(i)

= m

k|k−1

(i)

2. Propagate the σ-points through the measurement

model z

(i)

= h



(i)



, and form H as in (11)-(13)

3. Using x

(i)

and hyper-parameters θ

, compute

weights w

and W

according to (31) and (32).

Construct W

= diag

(

)

−1

4. Compute measurement mean, covariance and

state-measurement cross-covariance

k|k−1

= H

k|k−1

= H

H − m

k|k

)

+ diag



− tr



−1



+ R

k|k−1

= P

k|k−1

+ Λ)

−1

where the i-th row of

X is x

(i)

− m

k|k−1

5. Compute the ﬁltered mean m

k|k

and ﬁltered co-

variance P

k|k

= m

k|k−1

+ K



− m

k|k−1



k|k

= P

k|k−1

− K

k|k−1

with Kalman gain K

= P

k|k−1



k|k−1



−1

5 NUMERICAL ILLUSTRATION

In the numerical simulations the performance of

the ﬁlters was tested on a univariate non-stationary

growth model (UNGM) (Gordon et al., 1993)

k−1

25x

k−1

1 + x

k−1

+ 8 cos(1.2 k) + q

k−1

(33)

k−1

+ r

, (34)

with the state noise q

k−1

∼ N(0, 10), measurement

noise r

∼ N(0, 1) and initial conditions x

0|0

∼

N(0, 5).

ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics

384

Since the BHKF does not prescribe the σ-point lo-

cations, they can be chosen at will. The GHKF based

on the r-th order Gauss-Hermite (GH) quadrature rule

uses σ-points, which are determined as the roots of

the r-th degree univariate Hermite polynomial H

(x).

When it is required to integrate function of a vector

argument (n > 1), a multidimensional grid of points

is formed by the Cartesian product, leading to their

exponential growth (N = r

). The GH weights are

computed according to (Särkkä, 2013) as

[rH

r−1

(i)

)]

. (35)

The Unscented Transform (UT) is also a simple

quadrature rule (Ito and Xiong, 2000), that uses N =

2n + 1 deterministically chosen σ-points,

(i)

= m +

√

Pξ

(i)

(36)

with unit σ-points deﬁned as columns of the matrix

(0)

, ξ

(1)

, . . . , ξ

(2n+1)

0, cI

, −cI

(37)

where I

denotes n × n identity matrix. The corre-

sponding weights are deﬁned by

n + κ

, w

2(n + κ)

, i = 1, . . . , 2n (38)

with scaling factor c =

√

n + κ. All of the BHKFs

used the same set of hyper-parameters θ

= θ

[`, α]

= [3, 1]

. UKF operated with κ = 2. BHKFs

that used UT and GH σ-points of order 5, 7, and 10

were compared with their classical quadrature-based

counterparts, namely, UKF and GHKF of order 5, 7

and 10.

We performed 100 simulations, each for K = 500

time steps. Root-mean-square error (RMSE)

RMSE =

k=1



− m

k|k



(39)

was used to measure the overall error in the state esti-

mate m

k|k

across all time steps. As a metric that takes

into account the estimated state covariance, the Non-

credibility Index (NCI) (Li and Zhao, 2006) given by

NCI =

k=1

log



− m

k|k



−1

k|k



− m

k|k





− m

k|k



−1



− m

k|k



(40)

was used, where Σ

is the mean-square-error matrix.

The ﬁlter is said to be optimistic if it underestimates

the actual error, which is indicated by NCI > 0. Per-

fectly credible ﬁlter would provide NCI = 0, that is,

it would neither overestimate nor underestimate the

actual error.

Table 1: The average root-mean-square error.

BQ Classical

UT 10.544 ± 0.048 11.081 ± 0.159

GH5 10.740 ± 0.070 10.257 ± 0.133

GH7 10.306 ± 0.053 9.855 ± 0.133

GH10 10.431 ± 0.058 9.705 ± 0.120

Tables show average values of the performance

metrics across simulations with estimates of ±2 stan-

dard deviations (obtained by bootstrapping (Wasser-

man, 2006)). As evidenced by the results in Table 1,

the BQ provides superior RMSE performance only

for the case of UT σ-points. In the classical quadra-

ture case the performance improves with increasing

number of σ-points used. This trend is not observed

in the BQ case. We suspect that this is due to the

hyper-parameters θ

, θ

being ﬁxed to the same value

regardless of the number of σ-points. These act ef-

fectively as a training set of the GP model and thus,

it would make sense to use diﬀerent values of hyper-

parameters with training sets of diﬀerent sizes. Fig-

ure 2 illustrates the eﬀect of changing lengthscale

on the overall performance of the BHKF with UT σ-

points. The self-assessment of the ﬁlter performance

−3

−2

−1

RMSE, NCI

RMSE

NCI

Figure 2: Sensitivity of BHKF performance to changes in

the lengthscale hyperparameter `. The choice ` = 3 mini-

mizes NCI at the cost of slightly higher RMSE.

Table 2: The average non-credibility index.

BQ Classical

UT 5.106 ± 0.010 12.071 ± 0.045

GH5 4.977 ± 0.013 10.228 ± 0.065

GH7 4.321 ± 0.009 9.256 ± 0.070

GH10 3.647 ± 0.010 8.042 ± 0.077

is less optimistic in the case of BQ, as indicated by

the lower NCI in the Table 2. This indicates that the

BQ based ﬁlters are more conservative in their covari-

ance estimates. This is a consequence of including

additional uncertainty (integral variance), which the

classical quadrature-based ﬁlters do not utilize.

BayesianQuadratureinNonlinearFiltering

385

6 CONCLUSIONS

In this paper, we proposed a way of utilizing uncer-

tainty associated with integral approximations in the

nonlinear quadrature-based ﬁltering algorithms. This

was enabled by the Bayesian treatment of quadrature.

The proposed ﬁltering algorithms were tested on a

univariate benchmarking example. The results show

that the ﬁlters utilizing additional uncertainty pro-

vided by the BQ show signiﬁcant improvement in

terms of credibility of their estimates.

Proper setting of the hyper-parameters is crucially

important for achieving competitive results. The need

for a principled approach for dealing with the hyper-

parameters could prompt further research. Another

possible research direction could be concerned with

the adaptive placement of σ-points based on the pos-

terior integral variance.

ACKNOWLEDGEMENTS

This work was supported by the Czech Science Foun-

dation, project no. GACR P103-13-07058J.

REFERENCES

Arasaratnam, I. and Haykin, S. (2009). Cubature Kalman

Filters. IEEE Transactions on Automatic Control,

54(6):1254–1269.

Bhar, R. (2010). Stochastic ﬁltering with applications in

ﬁnance. World Scientiﬁc.

Deisenroth, M. P., Huber, M. F., and Hanebeck, U. D.

(2009). Analytic moment-based Gaussian process ﬁl-

tering. In Proceedings of the 26th Annual Interna-

tional Conference on Machine Learning - ICML ’09,

pages 1–8. ACM Press.

Deisenroth, M. P. and Ohlsson, H. (2011). A General Per-

spective on Gaussian Filtering and Smoothing: Ex-

plaining Current and Deriving New Algorithms. In

American Control Conference (ACC), 2011, pages

1807–1812. IEEE.

Deisenroth, M. P., Turner, R. D., Huber, M. F., Hanebeck,

U. D., and Rasmussen, C. E. (2012). Robust Filtering

and Smoothing with Gaussian Processes. IEEE Trans-

actions on Automatic Control, 57(7):1865–1871.

Duník, J., Straka, O., and Šimandl, M. (2013). Stochastic

Integration Filter. IEEE Transactions on Automatic

Control, 58(6):1561–1566.

Gelb, A. (1974). Applied Optimal Estimation. The MIT

Press.

Gillijns, S., Mendoza, O., Chandrasekar, J., De Moor, B.,

Bernstein, D., and Ridley, A. (2006). What is the en-

semble kalman ﬁlter and how well does it work? In

American Control Conference, 2006, page 6.

Girard, A., Rasmussen, C. E., Quiñonero Candela, J., and

Murray-Smith, R. (2003). Gaussian Process Priors

With Uncertain Inputs Application to Multiple-Step

Ahead Time Series Forecasting. In Becker, S., Thrun,

S., and Obermayer, K., editors, Advances in Neural

Information Processing Systems 15, pages 545–552.

MIT Press.

Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993).

Novel approach to nonlinear/non-Gaussian Bayesian

state estimation. IEE Proceedings F (Radar and Sig-

nal Processing), 140(2):107–113.

Grewal, M. S., Weill, L. R., and Andrews, A. P. (2007).

Global Positioning Systems, Inertial Navigation, and

Integration. Wiley.

Ito, K. and Xiong, K. (2000). Gaussian Filters for Nonlinear

Filtering Problems. IEEE Transactions on Automatic

Control, 45(5):910–927.

Jiang, T., Sidiropoulos, N., and Giannakis, G. (2003).

Kalman ﬁltering for power estimation in mobile com-

munications. Wireless Communications, IEEE Trans-

actions on, 2(1):151–161.

Julier, S. J., Uhlmann, J. K., and Durrant-Whyte, H. F.

(2000). A New Method for the Nonlinear Transfor-

mation of Means and Covariances in Filters and Es-

timators. IEEE Transactions on Automatic Control,

45(3):477–482.

Kalman, R. E. (1960). A New Approach to Linear Filtering

and Prediction Problems. Journal of Basic Engineer-

ing, 82(1):35–45.

Li, X. R. and Zhao, Z. (2006). Measuring Estimator’s Cred-

ibility: Noncredibility Index. In Information Fusion,

2006 9th International Conference on, pages 1–8.

Maybeck, P. S. (1982). Stochastic Models, Estimation and

Control: Volume 2. Academic Press.

Minka, T. P. (2000). Deriving Quadrature Rules from Gaus-

sian Processes. Technical report, Statistics Depart-

ment, Carnegie Mellon University, Tech. Rep.

Nørgaard, M., Poulsen, N. K., and Ravn, O. (2000). New

developments in state estimation for nonlinear sys-

tems. Automatica, 36:1627–1638.

O’Hagan, A. (1991). Bayes–Hermite quadrature. Journal

of Statistical Planning and Inference, 29(3):245–260.

Osborne, M. A., Rasmussen, C. E., Duvenaud, D. K., Gar-

nett, R., and Roberts, S. J. (2012). Active Learning

of Model Evidence Using Bayesian Quadrature. In

Advances in Neural Information Processing Systems

(NIPS), pages 46–54.

Rasmussen, C. E. and Ghahramani, Z. (2003). Bayesian

monte carlo. In S. Becker, S. T. and Obermayer, K.,

editors, Advances in Neural Information Processing

Systems 15, pages 489–496. MIT Press, Cambridge,

MA.

Rasmussen, C. E. and Williams, C. K. (2006). Gaussian

Processes for Machine Learning. The MIT Press.

Särkkä, S. (2013). Bayesian Filtering and Smoothing. Cam-

bridge University Press, New York.

Särkkä, S., Hartikainen, J., Svensson, L., and Sandblom,

F. (2014). Gaussian Process Quadratures in Nonlin-

ear Sigma-Point Filtering and Smoothing. In Informa-

ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics

386

tion Fusion (FUSION), 2014 17th International Con-

ference on, pages 1–8.

Sarmavuori, J. and Särkkä, S. (2012). Fourier-Hermite

Kalman Filter. IEEE Transactions on Automatic Con-

trol, 57(6):1511–1515.

Šimandl, M. and Duník, J. (2009). Derivative-free estima-

tion methods: New results and performance analysis.

Automatica, 45(7):1749–1757.

Smith, G. L., Schmidt, S. F., and McGee, L. A. (1962). Ap-

plication of statistical ﬁlter theory to the optimal esti-

mation of position and velocity on board a circumlu-

nar vehicle. Technical report, NASA Tech. Rep.

Wasserman, L. (2006). All of Nonparametric Statistics.

Springer-Verlag New York.

BayesianQuadratureinNonlinearFiltering

387