STATIONARY FULLY PROBABILISTIC CONTROL DESIGN

Tatiana V. Guy

Adaptive Systems Department

Institute of Information Theory and Automation

P.O. Box 18, 182 08 Prague 8, Czech Republic

Miroslav K

arn

Adaptive Systems Department

Institute of Information Theory and Automation

P.O. Box 18, 182 08 Prague 8, Czech Republic

Keywords:

Stochastic control design, stationary control, fully probabilistic design, stationary state space models.

Abstract:

Stochastic control design chooses the controller that makes the closed-loop behavior as close as possible

to the desired one. The fully probabilistic design describes both the closed-loop and its desired behavior

in probabilistic terms and uses the Kullback-Leibler divergence as their proximity measure. Such a design

provides explicit minimizer, which opens a way for a simpler approximations of analytically infeasible cases.

The current formulations are oriented towards ﬁnite-horizon design. Consequently, the optimal strategy is non-

stationary one. This paper provides inﬁnite-horizon problem formulation and solution. It leads to a stationary

strategy whose approximation is much easier.

1 INTRODUCTION

Stochastic control design (Kushner, 1971) chooses

control law that makes the closed-loop behaviour of

the controlled system as close as possible to the de-

sired behaviour. To review the numerous existing so-

lution methods and major restrictions of their applica-

tion, a survey (Lee and Lee 2004) can be advised.

In a wider context, the stochastic control design

can be viewed as a speciﬁc case of Bayesian dy-

namic decision making (Berger, 1985), which min-

imises the expected value of a loss function express-

ing control aim. Probabilistic description of both: the

closed-loop behaviour of the controlled system and

the desired behaviour makes a ground for fully prob-

abilistic design (FPD) of stochastic control. This de-

sign (K

arn

y, 1996; K

arn

y et al., 2003; K

arn

y et al.,

2005; K

arn

y and Guy, 2004) selects randomised con-

trol laws that make the entire joint distribution of

variables describing closed-loop behaviour as close as

possible to their desired distribution. The paper con-

siders asymptotic version of FPD. It suits to the situa-

tions when control horizon is large enough and leads

to simpliﬁed design applicable to a wider set of con-

trol problems than the non-stationary, ﬁnite-horizon

version.

The next section introduces necessary notions and

notations. Section 3 recalls the FPD in the most gen-

eral state-space setting and provides the extension of

its solution for the growing horizon. This is the main

result of the paper. Section 4 concludes the paper by

discussion about the practical signiﬁcance of the re-

sult obtained.

2 PRELIMINARIES

In the paper, ≡ stands for the equality by deﬁnition;

∗

denotes a set of all values of X;

X means car-

dinality of a ﬁnite set X

∗

; X(t) stands for the se-

quence (X

, . . . , X

), f(·|·) denotes probability den-

sity function (pdf); t labels discrete-time moments,

t ∈ t

∗

≡ {1, . . . ,

t};

t is a given control horizon

that can grow up to the inﬁnity; d

= (y

, u

) is the

ﬁnite-dimensional data record at the discrete time t,

consisting of the observed system output y

and of

the optional system input u

; x

stands for the ﬁnite-

dimensional unobserved system state at time t.

Arguments distinguish respective pdfs and no for-

mal distinction is made between a random variable,

its realisation and an argument of a pdf. Integrals en-

countered are multivariate and deﬁnite with integra-

tion domains coinciding with those of integrands.

The FPD exploits the Kullback-Leibler (KL) di-

vergence, an information entropy measure, (Kullback

and Leibler, 1951)

109

V. Guy T. and Kárný M. (2005).

STATIONARY FULLY PROBABILISTIC CONTROL DESIGN.

In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics, pages 109-112

DOI: 10.5220/0001183101090112

 SciTePress



f||



≡

f(X) ln



f(X)



dX. (1)

It measures proximity of pdfs f,

f acting on a set

∗

and has the following key property

D(f||

f) ≥ 0, D(f ||

f) = 0 iff f =

f (2)

almost everywhere on X

∗

The joint pdf f (d(

t), x(

t)|x

, d(0))f (x

|d(0)) =

f(d(

t), x(

t)|x

)f(x

) of all random considered vari-

ables is the most complete probabilistic description of

the closed-loop behaviour. The variable x

concerns

to have character of closed-loop state with x

is an

initial uncertain state. d(0) stands for the prior in-

formation serving for the choice of the input at time

t = 1. Further on, d(0) is considered implicitly only.

The chain rule for pdfs (Peterka, 1981) implies the

following decomposition of the joint pdf

f (d(

t), x(

t)|x

)

= f(x

)

t∈t

∗

f(y

, d(t − 1), x(t)) (3)

× f(x

, d(t− 1), x(t− 1))f (u

|d(t− 1), x(t− 1)).

The chosen decomposition (3) distinguishes:

observation model f(y

, d(t − 1), x(t));

time evolution model f(x

, d(t − 1), x(t − 1));

randomized controller law f(u

|d(t − 1), x(t − 1)).

As the variable x

represent closed-loop state, all

above models do not depend on its history, i.e x(t) =

. Moreover, assumed admissible controllers gen-

erate the system input u

using at most the historical

data observed d(t − 1) and cannot use the unobserved

states x(t − 1). Besides, the addressed stationary de-

sign requires all functions to be time independent. To

gain this, observed data d(t − 1), growing with time

t, has to enter the models via a ﬁxed-dimensional ob-

servable state φ

t−1

. Thus, the introduced closed-loop

description (3) reduces to:

f (d(

t), x(

t)|x

) (4)

t∈t

∗

f(y

|ψ

, x

)f(x

|ψ

, x

t−1

)f(u

|d(t − 1)),

where the regression vector ψ

≡ [u

, φ

t−1

3 FULLY PROBABILISTIC

DESIGN

The control aim and constraints are quantiﬁed by the

so-called ideal pdf that deﬁnes the desired joint dis-

tribution of the closed-loop variables considered. It

is constructed in the way analogous to (4) with the

user-speciﬁed factors are marked by the superscript

f(d(

t), x(

t)|x

)

f(x

) =

t∈t

∗

f(y

|ψ

, x

)

f(x

|ψ

, x

t−1

)

f(u

|d(t − 1))f (x

The pdfs

f(y

|ψ

, x

f(x

|ψ

, x

t−1

) describe the

ideal models of observation and state evolution and

f(u

|d(t − 1)) the ideal control law. The prior pdf

on possible initial states x

∗

cannot be inﬂuenced by

the optimized control strategy so that it is left to its

fate, i.e.

f(x

) = f(x

The formulation of FPD is straightforward: ﬁnd an

admissible control strategy minimizing the KL di-

vergence D



f(d(

t), x(

t))||

f(d(

t), x(

t))



Solution of the FPD requires the solution of sto-

chastic ﬁltering problem in the closed-loop.

Proposition 1 (Filtering in the closed-loop) Let the

prior pdf f(x

) be given. Then, the pdf

f(x

|d(t)), determining the state estimate, and the pdf

f(x

, d(t − 1)), determining the state prediction,

evolve according to the coupled equations

Time updating

f(x

,d(t−1))=

f(x

|ψ

, x

t−1

)f(x

t−1

|d(t−1))dx

t−1

Data updating

f(x

|d(t)) =

f(y

|ψ

, x

)f(x

, d(t − 1))

f(y

|ψ

, x

)f(x

, d(t − 1)) dx

{z }

f(y

,d(t−1))

The stochastic ﬁltering does not depend on the used

admissible control strategy {f (u

|d(t − 1))}

t∈t

∗

but

only on the generated inputs.

Let the time-invariant state estimate

f(x

) =

f(x

|d(t)) exist, where V

is a ﬁnite-dimensional sta-

tistic. Then, this function solves the equation

f(x

) = (5)

f(y

|ψ

, x

)

f(x

|ψ

, x

t−1

)

f(x

t−1

) dx

t−1

f(y

|ψ

, x

)f(x

|ψ

, x

t−1

)

f(x

t−1

) dx

t−1

{z }

f(y

|ψ

t−1

)

Proof : See, e.g. (K

arn

y and Guy, 2004).

Example 1 (Stationary Kalman ﬁlter) Let the time

evolution model be f(x

|ψ

, x

t−1

) = N

(Ax

t−1

, R), with N

(ˆz, w) denoting normal pdf of

z having expectation ˆz and covariance matrix

w. Let also the observation model be normal

f(y

|ψ

, x

) = N

(Cx

+ Du

, r). Assuming that

matrices A, B, C, D, R, r are known, the stationary

ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION

110

estimate

f(x

≡ (ˆx

, P )) = N

(ˆx

, P ). The ex-

pectation and covariance matrix of this estimate fulﬁll

the equations coinciding with stationary Kalman ﬁlter

(Meditch, 1969)

ˆx

= Aˆx

t−1

+ Bu

+ K(y

− C ˆx

t−1

− Du

), with

K = P C

′

−1

and P

−1

= C

′

−1

C+(AP A

′

+R)

−1

Respecting the aim of the paper, the solution of

FPD for the stationary state estimate is written.

Proposition 2 (Solution of FPD) Let the state esti-

mate reached its stationary form

f(x

). Then,

the optimal admissible control strategy in FPD

sense is the randomized one given by the pdfs

f(u

|φ

t−1

, V

t−1

)

f(u

|φ

t−1

)

exp[−ω(ψ

, V

t−1

)]

γ(φ

t−1

, V

t−1

)

, t ∈ t

∗

, (6)

γ(φ

t−1

, V

t−1

)

≡

f(u

|φ

t−1

) exp[−ω(ψ

, V

t−1

)] du

Starting with γ(φ

, V

) ≡ 1, the functions

ω(ψ

, V

t−1

) are generated recursively in the back-

ward manner for t =

t − 1, . . . , 1, as follows

ω(ψ

, V

t−1

) ≡

Ω(ψ

, x

t−1

)

f(x

t−1

)dx

t−1

f(x

) is updated according to Proposition 1 and

Ω(ψ

, x

t−1

) ≡

f(y

|ψ

, x

)f(x

|ψ

, x

t−1

)

× ln



f(y

|ψ

, x

)f(x

|ψ

, x

t−1

)

γ(φ

, V

)

f(y

|ψ

, x

)

f(x

|ψ

, x

t−1

)



Proof : See (K

arn

y and Guy, 2004)

The following proposition describes the key result

of this paper.

Proposition 3 (Solution of FPD for

t → ∞) For a

given randomized admissible strategy

{f(u

|d(t − 1)}

t=1

t < ∞,

the KL divergence is expected value of an additive

loss function.

Let there is such a controller for which the state

estimate reaches its stationary form

f(x

) and

expectation of the partial loss forming the Kullback-

Leibler divergence is bounded even for

t → ∞ by a

ﬁnite constant K.

Then, for the horizon

t → ∞, the optimal admis-

sible control strategy in FPD sense is stationary ran-

domized one given by the pdfs

f(u

|φ

t−1

, V

t−1

)

f(u

|φ

t−1

)

exp[−ω(ψ

, V

t−1

)]

γ(φ

t−1

, V

t−1

)

, t ∈ t

∗

, (7)

γ(φ

t−1

, V

t−1

)

≡

f(u

|φ

t−1

) exp[−ω(ψ

, V

t−1

)] du

The function ω(ψ

, V

t−1

) fulﬁlls the following equa-

tion with

f(x

) updated as in the Proposition 1

ω(ψ

, V

t−1

) ≡

f(y

|ψ

, x

)f(x

|ψ

, x

t−1

) (8)

× ln



f(y

|ψ

, x

)

f(u

t+1

|φ

) exp[−ω(ψ

t+1

, V

)] du

t+1

f(x

|ψ

, x

t−1

)

f(x

|ψ

, x

t−1

)

f(y

|ψ

, x

)



f(x

t−1

) dy

t−1

Proof : The KL divergence is an expectation of the

logarithm containing ratio of products. Thus, it rep-

resents an expected value of the additive loss func-

tion. According to the assumptions, there is a strategy

that makes expectations of the partial losses bounded.

The loss function, which equals to the KL divergence

−

t ln(K) for any constant K > 0 is minimised by the

same control law as the original KL divergence. At

the same time, there exists K such that the shifted loss

is bounded from the above for any

t and thus its limit

superior exists. The minimising strategy depends on

the reached minima γ (whose constant shifts do not

change the minimising strategy), which converges,

too. This implies convergence of ω and, ﬁnally, sta-

tionarity of the strategy obtained for growing hori-

zon. The function ω, determining it, meets station-

ary version of non-stationary equations in Proposition

2. By excluding the intermediate functions γ, Ω, the

claimed ﬁnal version can be obtained.

Example 2 (FPD for normal state-space model)

Let us assume controlled system described by the

normal state-space model as in the Example 1. Let us

consider the regulation problem, which implies that

we try to push all dynamics to zero while leaving the

uncontrollable innovations to their fate. Therefore,

the ideal pdf is

f(x

|ψ

, x

t−1

) = N

(0, R),

f(y

|ψ

, x

) = N

(0, r), while requiring

f(u

|d(t − 1)) = N

(0, q).

In this case, the optimal stationary control law is

f(u

|d(t − 1)) = N (Lˆx

t−1

with

q = (B

′

−1

B + D

′

−1

D + q

−1

)

−1

L =

−1

′

−1

A + D

′

−1

C) and

−1

= A

′

−1

A + C

′

−1

C − L

′ o

qL + R

−1

Note that the non-standard equation for stationary

Riccati matrix is caused by non-standard presence of

the term Du

in the observation model and by the

non-standard attempt to optimise jointly output and

the state. Without this, the mean value of the optimal

controller is usual stationary control law obtained in

linear quadratic design with the state penalisation

−1

and input penalisation q

−1

STATIONARY FULLY PROBABILISTIC CONTROL DESIGN

111

Interpretation of

q and inversion of Riccati matrix

Q is non-standard: they represent stationary covari-

ance matrices of the optimal inputs and states, respec-

tively.

4 DISCUSSION

Design of the optimal strategy reduces to the solution

of the stationary version of the integral ﬁltering equa-

tion (5) and of the solution of the integral equation (8).

For the normal state-space model and the normal ideal

pdf, it reduces to the stationary version of Kalman ﬁl-

ter and design minimising quadratic criterion (Med-

itch, 1969). Even in this case, the FPD interpretation

brings practical advantages, as it interprets the penal-

isation matrices as inversions of the ideal covariance

matrices and thus guides on their choice. Moreover,

when they are recursively (approximately) estimated,

the weights adapts to the varying noise level, which

generally spares the input effort as the control of un-

controllable innovations is given up.

Generally, the closed-form solutions of discussed

equations exist rarely but the explicit form of con-

trol laws simpliﬁes numerical approximations sub-

stantially. The stationary form of the solution pre-

pares such approximations even better as ””only” the

stationary functions f (x

) and ω(ψ

, V

t−1

) have to

be approximated (not sequences of such functions).

Non-linear character of the ﬁltering and the de-

sign equations together with a generic high dimen-

sionality of their domain restrict supply of available

approximation techniques. Essentially, a global ap-

proximation suitable for higher dimensions has to be

used. The neural networks (Haykin, 1994), ideally in-

terpreted as ﬁnite probabilistic mixtures (Titterington

et al., 1985) and general ANOVA-like approximations

(Rabitz and Alis, 1999) seem to be prime candidates.

Especially, the mixture versions look promising as

there are approximate techniques for FPD with them

(Murray-Smith and Johansen, 1997; K

arn

y et al.,

2003).

ACKNOWLEDGEMENTS

This research has been partially supported by the

grants GA

CR 102/03/0049, GA

CR 102/03/P010 and

by the grant 1M6798555601 of the Czech Ministry of

Education, Youth and Sports.

REFERENCES

Berger, J. (1985). Statistical Decision Theory and Bayesian

Analysis. Springer-Verlag, New York.

Haykin, S. (1994). Neural networks: A comprehensive

foundation. Macmillan College Publishing Company,

New York.

arn

y, M. (1996). Towards fully probabilistic control de-

sign. Automatica, 32(12):1719–1722.

arn

y, M., B

ohm, J., Guy, T., Jirsa, L., Nagy, I., Nedoma,

P., and Tesa

r, L. (2005). Optimized Bayesian Dynamic

Advising: Theory and Algorithms. Springer, London.

to appear.

arn

y, M., B

ohm, J., Guy, T. V., and Nedoma, P. (2003).

Mixture-based adaptive probabilistic control. In-

ternational Journal of Adaptive Control and Signal

Processing, 17(2):119–132.

arn

y, M. and Guy, T. (2004). Fully probabilistic control

design. Systems & Control Letters. submitted.

Kullback, S. and Leibler, R. (1951). On information and

sufﬁciency. Annals of Mathematical Statistics, 22:79–

87.

Kushner, H. (1971). Introduction to stochastic control.

Holt, Rinehart and Winston, New York, San Fran-

cisco, London.

Lee, J.M. and Lee, J.H. (2004). Approximate dynamic

procgramming strategies and their applica bility for

process control: A review and future directions. Inter-

national Journal of Control, 2(3):263–278.

Meditch, J. (1969). Stochastic Optimal Linear Estimation

and Control. Mc. Graw Hill.

Murray-Smith, R. and Johansen, T. (1997). Multiple Model

Approaches to Modelling and Control. Taylor & Fran-

cis, London.

Peterka, V. (1981). Bayesian system identiﬁcation. In

Eykhoff, P., editor, Trends and Progress in System

Identiﬁcation, pages 239–304. Pergamon Press, Ox-

ford.

Rabitz, H. and Alis, O.(1999). General foundations of high-

dimensional model representations. Journal of Math-

ematical Chemistry, 25:197–233.

Titterington, D., Smith, A., and Makov, U. (1985). Statis-

tical Analysis of Finite Mixtures. John Wiley & Sons,

Chichester, New York, Brisbane, Toronto, Singapore.

ISBN 0 471 90763 4.

ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION

112