STATIONARY FULLY PROBABILISTIC CONTROL DESIGN
Tatiana V. Guy
Adaptive Systems Department
Institute of Information Theory and Automation
P.O. Box 18, 182 08 Prague 8, Czech Republic
Miroslav K
´
arn
´
y
Adaptive Systems Department
Institute of Information Theory and Automation
P.O. Box 18, 182 08 Prague 8, Czech Republic
Keywords:
Stochastic control design, stationary control, fully probabilistic design, stationary state space models.
Abstract:
Stochastic control design chooses the controller that makes the closed-loop behavior as close as possible
to the desired one. The fully probabilistic design describes both the closed-loop and its desired behavior
in probabilistic terms and uses the Kullback-Leibler divergence as their proximity measure. Such a design
provides explicit minimizer, which opens a way for a simpler approximations of analytically infeasible cases.
The current formulations are oriented towards finite-horizon design. Consequently, the optimal strategy is non-
stationary one. This paper provides infinite-horizon problem formulation and solution. It leads to a stationary
strategy whose approximation is much easier.
1 INTRODUCTION
Stochastic control design (Kushner, 1971) chooses
control law that makes the closed-loop behaviour of
the controlled system as close as possible to the de-
sired behaviour. To review the numerous existing so-
lution methods and major restrictions of their applica-
tion, a survey (Lee and Lee 2004) can be advised.
In a wider context, the stochastic control design
can be viewed as a specific case of Bayesian dy-
namic decision making (Berger, 1985), which min-
imises the expected value of a loss function express-
ing control aim. Probabilistic description of both: the
closed-loop behaviour of the controlled system and
the desired behaviour makes a ground for fully prob-
abilistic design (FPD) of stochastic control. This de-
sign (K
´
arn
´
y, 1996; K
´
arn
´
y et al., 2003; K
´
arn
´
y et al.,
2005; K
´
arn
´
y and Guy, 2004) selects randomised con-
trol laws that make the entire joint distribution of
variables describing closed-loop behaviour as close as
possible to their desired distribution. The paper con-
siders asymptotic version of FPD. It suits to the situa-
tions when control horizon is large enough and leads
to simplified design applicable to a wider set of con-
trol problems than the non-stationary, finite-horizon
version.
The next section introduces necessary notions and
notations. Section 3 recalls the FPD in the most gen-
eral state-space setting and provides the extension of
its solution for the growing horizon. This is the main
result of the paper. Section 4 concludes the paper by
discussion about the practical significance of the re-
sult obtained.
2 PRELIMINARIES
In the paper, stands for the equality by definition;
X
denotes a set of all values of X;
˚
X means car-
dinality of a finite set X
; X(t) stands for the se-
quence (X
1
, . . . , X
t
), f(·|·) denotes probability den-
sity function (pdf); t labels discrete-time moments,
t t
{1, . . . ,
˚
t};
˚
t is a given control horizon
that can grow up to the infinity; d
t
= (y
t
, u
t
) is the
finite-dimensional data record at the discrete time t,
consisting of the observed system output y
t
and of
the optional system input u
t
; x
t
stands for the finite-
dimensional unobserved system state at time t.
Arguments distinguish respective pdfs and no for-
mal distinction is made between a random variable,
its realisation and an argument of a pdf. Integrals en-
countered are multivariate and definite with integra-
tion domains coinciding with those of integrands.
The FPD exploits the Kullback-Leibler (KL) di-
vergence, an information entropy measure, (Kullback
and Leibler, 1951)
109
V. Guy T. and Kár M. (2005).
STATIONARY FULLY PROBABILISTIC CONTROL DESIGN.
In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics, pages 109-112
DOI: 10.5220/0001183101090112
Copyright
c
SciTePress
D
f||
˜
f
Z
f(X) ln
f(X)
˜
f(X)
dX. (1)
It measures proximity of pdfs f,
˜
f acting on a set
X
and has the following key property
D(f||
˜
f) 0, D(f ||
˜
f) = 0 iff f =
˜
f (2)
almost everywhere on X
.
The joint pdf f (d(
˚
t), x(
˚
t)|x
0
, d(0))f (x
0
|d(0)) =
f(d(
˚
t), x(
˚
t)|x
0
)f(x
0
) of all random considered vari-
ables is the most complete probabilistic description of
the closed-loop behaviour. The variable x
t
concerns
to have character of closed-loop state with x
0
is an
initial uncertain state. d(0) stands for the prior in-
formation serving for the choice of the input at time
t = 1. Further on, d(0) is considered implicitly only.
The chain rule for pdfs (Peterka, 1981) implies the
following decomposition of the joint pdf
f (d(
˚
t), x(
˚
t)|x
0
)
= f(x
0
)
Y
tt
f(y
t
|u
t
, d(t 1), x(t)) (3)
× f(x
t
|u
t
, d(t 1), x(t 1))f (u
t
|d(t 1), x(t 1)).
The chosen decomposition (3) distinguishes:
observation model f(y
t
|u
t
, d(t 1), x(t));
time evolution model f(x
t
|u
t
, d(t 1), x(t 1));
randomized controller law f(u
t
|d(t 1), x(t 1)).
As the variable x
t
represent closed-loop state, all
above models do not depend on its history, i.e x(t) =
x
t
. Moreover, assumed admissible controllers gen-
erate the system input u
t
using at most the historical
data observed d(t 1) and cannot use the unobserved
states x(t 1). Besides, the addressed stationary de-
sign requires all functions to be time independent. To
gain this, observed data d(t 1), growing with time
t, has to enter the models via a fixed-dimensional ob-
servable state φ
t1
. Thus, the introduced closed-loop
description (3) reduces to:
f (d(
˚
t), x(
˚
t)|x
0
) (4)
=
Y
tt
f(y
t
|ψ
t
, x
t
)f(x
t
|ψ
t
, x
t1
)f(u
t
|d(t 1)),
where the regression vector ψ
t
[u
t
, φ
t1
].
3 FULLY PROBABILISTIC
DESIGN
The control aim and constraints are quantified by the
so-called ideal pdf that defines the desired joint dis-
tribution of the closed-loop variables considered. It
is constructed in the way analogous to (4) with the
user-specified factors are marked by the superscript
I
I
f(d(
˚
t), x(
˚
t)|x
0
)
I
f(x
0
) =
Y
tt
I
f(y
t
|ψ
t
, x
t
)
×
I
f(x
t
|ψ
t
, x
t1
)
I
f(u
t
|d(t 1))f (x
0
).
The pdfs
I
f(y
t
|ψ
t
, x
t
),
I
f(x
t
|ψ
t
, x
t1
) describe the
ideal models of observation and state evolution and
I
f(u
t
|d(t 1)) the ideal control law. The prior pdf
on possible initial states x
0
cannot be influenced by
the optimized control strategy so that it is left to its
fate, i.e.
I
f(x
0
) = f(x
0
).
The formulation of FPD is straightforward: find an
admissible control strategy minimizing the KL di-
vergence D
f(d(
˚
t), x(
˚
t))||
I
f(d(
˚
t), x(
˚
t))
.
Solution of the FPD requires the solution of sto-
chastic filtering problem in the closed-loop.
Proposition 1 (Filtering in the closed-loop) Let the
prior pdf f(x
0
) be given. Then, the pdf
f(x
t
|d(t)), determining the state estimate, and the pdf
f(x
t
|u
t
, d(t 1)), determining the state prediction,
evolve according to the coupled equations
Time updating
f(x
t
|u
t
,d(t1))=
Z
f(x
t
|ψ
t
, x
t1
)f(x
t1
|d(t1))dx
t1
Data updating
f(x
t
|d(t)) =
f(y
t
|ψ
t
, x
t
)f(x
t
|u
t
, d(t 1))
Z
f(y
t
|ψ
t
, x
t
)f(x
t
|u
t
, d(t 1)) dx
t
|
{z }
f(y
t
|u
t
,d(t1))
.
The stochastic filtering does not depend on the used
admissible control strategy {f (u
t
|d(t 1))}
tt
but
only on the generated inputs.
Let the time-invariant state estimate
s
f(x
t
|V
t
) =
f(x
t
|d(t)) exist, where V
t
is a finite-dimensional sta-
tistic. Then, this function solves the equation
s
f(x
t
|V
t
) = (5)
f(y
t
|ψ
t
, x
t
)
R
f(x
t
|ψ
t
, x
t1
)
s
f(x
t1
|V
t1
) dx
t1
Z
f(y
t
|ψ
t
, x
t
)f(x
t
|ψ
t
, x
t1
)
s
f(x
t1
|V
t1
) dx
t1
dx
t
|
{z }
f(y
t
|ψ
t
,V
t1
)
Proof : See, e.g. (K
´
arn
´
y and Guy, 2004).
Example 1 (Stationary Kalman filter) Let the time
evolution model be f(x
t
|ψ
t
, x
t1
) = N
x
t
(Ax
t1
+
Bu
t
, R), with N
z
(ˆz, w) denoting normal pdf of
z having expectation ˆz and covariance matrix
w. Let also the observation model be normal
f(y
t
|ψ
t
, x
t
) = N
y
t
(Cx
t
+ Du
t
, r). Assuming that
matrices A, B, C, D, R, r are known, the stationary
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
110
estimate
s
f(x
t
|V
t
(ˆx
t
, P )) = N
x
t
(ˆx
t
, P ). The ex-
pectation and covariance matrix of this estimate fulfill
the equations coinciding with stationary Kalman filter
(Meditch, 1969)
ˆx
t
= Aˆx
t1
+ Bu
t
+ K(y
t
C ˆx
t1
Du
t
), with
K = P C
r
1
and P
1
= C
r
1
C+(AP A
+R)
1
.
Respecting the aim of the paper, the solution of
FPD for the stationary state estimate is written.
Proposition 2 (Solution of FPD) Let the state esti-
mate reached its stationary form
s
f(x
t
|V
t
). Then,
the optimal admissible control strategy in FPD
sense is the randomized one given by the pdfs
o
f(u
t
|φ
t1
, V
t1
)
=
I
f(u
t
|φ
t1
)
exp[ω(ψ
t
, V
t1
)]
γ(φ
t1
, V
t1
)
, t t
, (6)
γ(φ
t1
, V
t1
)
Z
I
f(u
t
|φ
t1
) exp[ω(ψ
t
, V
t1
)] du
t
.
Starting with γ(φ
˚
t
, V
˚
t
) 1, the functions
ω(ψ
t
, V
t1
) are generated recursively in the back-
ward manner for t =
˚
t,
˚
t 1, . . . , 1, as follows
ω(ψ
t
, V
t1
)
Z
Ω(ψ
t
, x
t1
)
s
f(x
t1
|V
t1
)dx
t1
.
s
f(x
t
|V
t
) is updated according to Proposition 1 and
Ω(ψ
t
, x
t1
)
Z
f(y
t
|ψ
t
, x
t
)f(x
t
|ψ
t
, x
t1
)
× ln
f(y
t
|ψ
t
, x
t
)f(x
t
|ψ
t
, x
t1
)
γ(φ
t
, V
t
)
I
f(y
t
|ψ
t
, x
t
)
I
f(x
t
|ψ
t
, x
t1
)
dy
t
dx
t
.
Proof : See (K
´
arn
´
y and Guy, 2004)
The following proposition describes the key result
of this paper.
Proposition 3 (Solution of FPD for
˚
t ) For a
given randomized admissible strategy
{f(u
t
|d(t 1)}
˚
t
t=1
,
˚
t < ,
the KL divergence is expected value of an additive
loss function.
Let there is such a controller for which the state
estimate reaches its stationary form
s
f(x
t
|V
t
) and
expectation of the partial loss forming the Kullback-
Leibler divergence is bounded even for
˚
t by a
finite constant K.
Then, for the horizon
˚
t , the optimal admis-
sible control strategy in FPD sense is stationary ran-
domized one given by the pdfs
o
f(u
t
|φ
t1
, V
t1
)
=
I
f(u
t
|φ
t1
)
exp[ω(ψ
t
, V
t1
)]
γ(φ
t1
, V
t1
)
, t t
, (7)
γ(φ
t1
, V
t1
)
Z
I
f(u
t
|φ
t1
) exp[ω(ψ
t
, V
t1
)] du
t
.
The function ω(ψ
t
, V
t1
) fulfills the following equa-
tion with
s
f(x
t
|V
t
) updated as in the Proposition 1
ω(ψ
t
, V
t1
)
Z
f(y
t
|ψ
t
, x
t
)f(x
t
|ψ
t
, x
t1
) (8)
× ln
f(y
t
|ψ
t
, x
t
)
R
I
f(u
t+1
|φ
t
) exp[ω(ψ
t+1
, V
t
)] du
t+1
×
f(x
t
|ψ
t
, x
t1
)
I
f(x
t
|ψ
t
, x
t1
)
I
f(y
t
|ψ
t
, x
t
)
×
s
f(x
t1
|V
t1
) dy
t
dx
t
dx
t1
.
Proof : The KL divergence is an expectation of the
logarithm containing ratio of products. Thus, it rep-
resents an expected value of the additive loss func-
tion. According to the assumptions, there is a strategy
that makes expectations of the partial losses bounded.
The loss function, which equals to the KL divergence
˚
t ln(K) for any constant K > 0 is minimised by the
same control law as the original KL divergence. At
the same time, there exists K such that the shifted loss
is bounded from the above for any
˚
t and thus its limit
superior exists. The minimising strategy depends on
the reached minima γ (whose constant shifts do not
change the minimising strategy), which converges,
too. This implies convergence of ω and, finally, sta-
tionarity of the strategy obtained for growing hori-
zon. The function ω, determining it, meets station-
ary version of non-stationary equations in Proposition
2. By excluding the intermediate functions γ, , the
claimed final version can be obtained.
Example 2 (FPD for normal state-space model)
Let us assume controlled system described by the
normal state-space model as in the Example 1. Let us
consider the regulation problem, which implies that
we try to push all dynamics to zero while leaving the
uncontrollable innovations to their fate. Therefore,
the ideal pdf is
I
f(x
t
|ψ
t
, x
t1
) = N
x
t
(0, R),
I
f(y
t
|ψ
t
, x
t
) = N
y
t
(0, r), while requiring
I
f(u
t
|d(t 1)) = N
u
t
(0, q).
In this case, the optimal stationary control law is
o
f(u
t
|d(t 1)) = N (Lˆx
t1
,
o
q)
with
o
q = (B
Q
1
B + D
r
1
D + q
1
)
1
L =
o
q
1
(B
Q
1
A + D
r
1
C) and
Q
1
= A
Q
1
A + C
r
1
C L
o
qL + R
1
.
Note that the non-standard equation for stationary
Riccati matrix is caused by non-standard presence of
the term Du
t
in the observation model and by the
non-standard attempt to optimise jointly output and
the state. Without this, the mean value of the optimal
controller is usual stationary control law obtained in
linear quadratic design with the state penalisation
R
1
and input penalisation q
1
.
STATIONARY FULLY PROBABILISTIC CONTROL DESIGN
111
Interpretation of
o
q and inversion of Riccati matrix
Q is non-standard: they represent stationary covari-
ance matrices of the optimal inputs and states, respec-
tively.
4 DISCUSSION
Design of the optimal strategy reduces to the solution
of the stationary version of the integral filtering equa-
tion (5) and of the solution of the integral equation (8).
For the normal state-space model and the normal ideal
pdf, it reduces to the stationary version of Kalman fil-
ter and design minimising quadratic criterion (Med-
itch, 1969). Even in this case, the FPD interpretation
brings practical advantages, as it interprets the penal-
isation matrices as inversions of the ideal covariance
matrices and thus guides on their choice. Moreover,
when they are recursively (approximately) estimated,
the weights adapts to the varying noise level, which
generally spares the input effort as the control of un-
controllable innovations is given up.
Generally, the closed-form solutions of discussed
equations exist rarely but the explicit form of con-
trol laws simplifies numerical approximations sub-
stantially. The stationary form of the solution pre-
pares such approximations even better as ””only” the
stationary functions f (x
t
|V
t
) and ω(ψ
t
, V
t1
) have to
be approximated (not sequences of such functions).
Non-linear character of the filtering and the de-
sign equations together with a generic high dimen-
sionality of their domain restrict supply of available
approximation techniques. Essentially, a global ap-
proximation suitable for higher dimensions has to be
used. The neural networks (Haykin, 1994), ideally in-
terpreted as finite probabilistic mixtures (Titterington
et al., 1985) and general ANOVA-like approximations
(Rabitz and Alis, 1999) seem to be prime candidates.
Especially, the mixture versions look promising as
there are approximate techniques for FPD with them
(Murray-Smith and Johansen, 1997; K
´
arn
´
y et al.,
2003).
ACKNOWLEDGEMENTS
This research has been partially supported by the
grants GA
ˇ
CR 102/03/0049, GA
ˇ
CR 102/03/P010 and
by the grant 1M6798555601 of the Czech Ministry of
Education, Youth and Sports.
REFERENCES
Berger, J. (1985). Statistical Decision Theory and Bayesian
Analysis. Springer-Verlag, New York.
Haykin, S. (1994). Neural networks: A comprehensive
foundation. Macmillan College Publishing Company,
New York.
K
´
arn
´
y, M. (1996). Towards fully probabilistic control de-
sign. Automatica, 32(12):1719–1722.
K
´
arn
´
y, M., B
¨
ohm, J., Guy, T., Jirsa, L., Nagy, I., Nedoma,
P., and Tesa
ˇ
r, L. (2005). Optimized Bayesian Dynamic
Advising: Theory and Algorithms. Springer, London.
to appear.
K
´
arn
´
y, M., B
¨
ohm, J., Guy, T. V., and Nedoma, P. (2003).
Mixture-based adaptive probabilistic control. In-
ternational Journal of Adaptive Control and Signal
Processing, 17(2):119–132.
K
´
arn
´
y, M. and Guy, T. (2004). Fully probabilistic control
design. Systems & Control Letters. submitted.
Kullback, S. and Leibler, R. (1951). On information and
sufficiency. Annals of Mathematical Statistics, 22:79–
87.
Kushner, H. (1971). Introduction to stochastic control.
Holt, Rinehart and Winston, New York, San Fran-
cisco, London.
Lee, J.M. and Lee, J.H. (2004). Approximate dynamic
procgramming strategies and their applica bility for
process control: A review and future directions. Inter-
national Journal of Control, 2(3):263–278.
Meditch, J. (1969). Stochastic Optimal Linear Estimation
and Control. Mc. Graw Hill.
Murray-Smith, R. and Johansen, T. (1997). Multiple Model
Approaches to Modelling and Control. Taylor & Fran-
cis, London.
Peterka, V. (1981). Bayesian system identification. In
Eykhoff, P., editor, Trends and Progress in System
Identification, pages 239–304. Pergamon Press, Ox-
ford.
Rabitz, H. and Alis, O.(1999). General foundations of high-
dimensional model representations. Journal of Math-
ematical Chemistry, 25:197–233.
Titterington, D., Smith, A., and Makov, U. (1985). Statis-
tical Analysis of Finite Mixtures. John Wiley & Sons,
Chichester, New York, Brisbane, Toronto, Singapore.
ISBN 0 471 90763 4.
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
112