DUAL CONTROLLERS FOR DISCRETE-TIME STOCHASTIC

AMPLITUDE-CONSTRAINED SYSTEMS

A. Kr

olikowski and D. Horla

Pozna

n University of Technology

Institute of Control and Information Engineering

ul.Piotrowo 3A, 60-965 Pozna

n, Poland

Keywords:

Input constraint. Suboptimal dual control.

Abstract:

The paper considers a suboptimal solution to the dual control problem for discrete-time stochastic systems in

the case of amplitude constraint imposed on the control signal. The objective of the control is to minimize the

variance of the output around the given reference sequence. The presented approaches are based on: an MIDC

(Modiﬁed Innovation Dual Controller) derived from an IDC (Innovation Dual Controller), a TSDSC (Two-

stage Dual Suboptimal Control, and a PP (Pole Placement) controller. Finally, the certainty equivalence (CE)

control method is included for comparative analysis. In all algorithms, the standard Kalman ﬁlter equations

are applied for estimation of the unknown system parameters. Example of second order system is simulated in

order to compare the performance of control methods. Conclusions yielded from simulation study are given.

1 INTRODUCTION

Much work has been done on the optimal control of

stochastic systems which contain parametric uncer-

tainty. The problem is inherently related with the dual

control problem originally presented by Fel’dbaum

who suggested that in the dual control, the problems

of learning and control should be considered simul-

taneously in order to minimize the cost function. In

general, learning and controlling have contradictory

goals, particularly for the ﬁnite horizon control prob-

lems. The concept of duality has inspired the devel-

opment of many control techniques which involve the

dual effect of the control signal. They can be sepa-

rated in two classes: explicit dual and implicit dual

(Bayard and Eslami, 1985). Unfortunately, the dual

approach does not result in computationally feasible

optimal algorithms. A variety of suboptimal solutions

has been proposed and many of them were heuris-

tic identiﬁer-controller structures. Other controllers

like minimax controllers (Sebald, 1979), Bayes con-

trollers (Sworder, 1966) or MRAC (Model Reference

Adaptive Controller) (

Astr

om and Wittenmark, 1989)

are available.

The objective of this paper is to present and com-

pare different approaches to suboptimal solution of

the minimum variance control problem of discrete-

time stochastic systems with unknown parameters. In

this paper, an amplitude-constrained control input is

considered which is an important practical case. A

majority of proposed solutions in the literature does

not include the input constraint into the design of con-

trol system. The saturation imposed on control sig-

nal deteriorates the probability density function (pdf)

of the state from the Gaussian which makes ﬁnding

an optimal control difﬁcult even when system param-

eters are known. The dual methods described here

are: the MIDC method which is the modiﬁcation of

the IDC (R. Milito and Cadorin, 1982) approach, the

method based on the two-stage dual suboptimal con-

trol (TSDSC) approach (Maitelliand and Yoneyama,

1994) and the method based on the pole placement

approach (Filatov and Unbehauen, 2004).

The Iteration in Policy Space (IPS) algorithm and

its reduced complexity version were proposed by Ba-

yard (Bayard, 1991) for a general nonlinear system.

In this algorithm the stochastic dynamic program-

ming equations are solved forward in time ,using

a nested stochastic approximation technique. The

method is based on a speciﬁc computational architec-

ture denoted as a H block. The method needs a ﬁlter

propagating the state and parameter estimates with as-

130

Królikowski A. and Horla D. (2007).

DUAL CONTROLLERS FOR DISCRETE-TIME STOCHASTIC AMPLITUDE-CONSTRAINED SYSTEMS.

In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 130-134

DOI: 10.5220/0001620401300134

 SciTePress

sociated covariance matrices.

In (Kr

olikowski, 2000), some modiﬁcations in-

cluding input constraint have been introduced into the

original version of the IPS algorithm and its perfor-

mance has been compared with MIDC algorithm.

This paper has a tutorial nature, and the possibility

of incorporating the input constraint into the control

algorithms was the motivation for a selection of the

overviewed approaches.

Performance of the considered algorithms is il-

lustrated by simulation study of second-order system

with control signal constrained in amplitude.

2 CONTROL PROBLEM

FORMULATION

Consider a discrete-time linear single-input single-

output system described by ARX model

A(q

−1

= B(q

−1

+ w

, (1)

where A(q

−1

) = 1 + a

1,k

−1

+ ··· + a

na,k

−na

B(q

−1

) = b

1,k

−1

+ ··· + b

nb,k

−nb

, y

is the output

available for measurement, u

is the control sig-

nal, {w

} is a sequence of independent identically

distributed gaussian variables with zero mean and

variance σ

. Process noise w

is statistically inde-

pendent of the initial condition y

. The system (1)

is parametrized by a vector θ

containing na + nb

unknown parameters {a

i,k

} and {b

i,k

} which in

general can be assumed to vary according to the

equation

k+1

= Φθ

+ e

(2)

where Φ is a known matrix and {e

} is a sequence of

independent identically distributed gaussian variables

with zero mean and variance matrix R

. Particularly,

for the constant parameters we have

k+1

= θ

= (b

, ··· , b

, a

, ···a

)

, (3)

and then Φ = I, e

= 0 in (2).

The control signal is subjected to an amplitude

constraint

| u

|≤ α (4)

and the information state I

at time k is deﬁned by

= [y

, ..., y

, u

k−1

, ..., u

, I

] (5)

where I

denotes the initial conditions.

An admissible control policy Π is deﬁned by a se-

quence of controls Π = [u

, ..., u

N−1

] where each con-

trol u

is a function of I

and satisﬁes the constraint

(4). The control objective is to ﬁnd a control policy

Π which minimizes the following expected cost func-

tion

J = E[

N−1

∑

k=0

k+1

− r

k+1

)

] (6)

where {r

} is a given reference sequence. An admis-

sible control policy minimizing (6) can be labelled by

CCLO (Constrained Closed-Loop Optimal) in keep-

ing with the standard nomenclature, i.e. Π

CCLO

, ..., u

CCLO

N−1

]. This control policy has no closed

form, and control policies presented in the following

section can be viewed as a suboptimal approach to the

CCLO

3 SUBOPTIMAL DUAL

CONTROL METHODS

In this section, we shall brieﬂy describe three meth-

ods giving an approximate solution to the problem

formulated in Section 2. The ﬁrst one is the MIDC

algorithm based on the IDC approach (R. Milito and

Cadorin, 1982) which is an explicit dual control ap-

proach.

3.1 Method based on the Innovation

Dual Control (IDC) Approach:

Derivation of Π

MIDC

The IDC has been derived for system (1) with uncon-

strained control and constant parameters (3). The fol-

lowing cost function was considered

J =

E[(y

k+1

− r

k+1

)

− λ

k+1

] (7)

where λ

k+1

≥ 0 is the learning weight, and ε

k+1

is the

innovation, see (16).

The modiﬁed IDC, u

MIDC

, takes the constraint

into account which results in the following closed-

form expression

MIDC

=−sat







[(1−λ

k+1

∗

1,k

∗

−

1,k

k+1

(1− λ

k+1

1,k

;α







(8)

where

= (u

, u

k−1

, . . . , u

k−nb+1

, −y

, . . . , −y

k−na+1

)

= (u

, s

∗

)

, (9)

and following partitioning is introduced for parame-

ter covariance matrix P

∗

(10)

DUAL CONTROLLERS FOR DISCRETE-TIME STOCHASTIC AMPLITUDE-CONSTRAINED SYSTEMS

131

corresponding to the partition of θ

θ = (b

, θ

∗T

)

(11)

with

∗

= (b

, . . . , b

, a

, . . . , a

)

. (12)

The estimates

needed to calculate u

MIDC

can be

obtained in many ways. A common way is to use the

standard Kalman ﬁlter in a form of suitable recursive

procedure for parameter estimation, i.e.

k+1

= Φ

+ k

k+1

(13)

k+1

= ΦP

+ σ

]

−1

(14)

k+1

= [Φ− k

k+1

+ R

, (15)

k+1

= y

k+1

− s

. (16)

3.2 Method based on the Two-stage

Dual Suboptimal Control (TSDSC)

Approach: Derivation of Π

TSDSC

The TSDSC proposed in (Maitelliand and Yoneyama,

1994) has been derived for system (1) with stochastic

parameters (2). Below this method is extended for the

input-constrained case. The cost function considered

for TSDSC is given by

J =

E[(y

k+1

− r)

+ (y

k+2

− r)

] (17)

and according to (Maitelliand and Yoneyama, 1994)

can be obtained as a quadratic form in u

and u

k+1

i.e.

J =

[au

+ bu

k+1

+ cu

k+1

+ du

+ eu

k+1

] (18)

where a, b, c, d, e are expressions depending on cur-

rent data s

∗

, reference signal r and parameter esti-

mates

(Maitelliand and Yoneyama, 1994). Solving

a necessary optimality condition the unconstrained

control signal is

TSDSC,un

bc− 2ae

4de− c

. (19)

This control law has been taken for simulation anal-

ysis in (Maitelliand and Yoneyama, 1994). Imposing

the cutoff the constrained control signal is

TSDSC,co

= sat(u

TSDSC,un

;α). (20)

The cost function (18) can be represented as a

quadratic form

J =

+ b

] (21)

where u

= (u

, u

k+1

)

, and

A =



c e



, b





. (22)

The condition 4de− c

> 0 together with d > 0 im-

plies positive deﬁnitness and guarantees convexity.

Minimization of (21) under constraint (4) is a stan-

dard QP problem resulting in u

TSDSC,qp

. The con-

strained control u

TSDSC,qp

is then applied to the sys-

tem in receding horizon framework.

3.3 Method based on the Pole

Placement (PP) Approach:

Derivation of Π

Let the desired stable closed-loop polynomial be de-

scribed by A

∗

−1

) = 1 + a

∗

−1

+ ··· + a

∗

−n

∗

. A

dual version of a direct adaptive PP controller pro-

posed in (N.M. Filatov and Keuchel, 1993; Filatov

and Unbehauen, 2004) has been derived for system

(1) where integral actions can be included. To this

end, a bicriterial approach has been used to solve the

synthesis problem. The two criteria correspond to the

two goals of the dual adaptive control, namely to con-

trol the system output close to the reference signal,

and to accelerate the parameter estimation process for

future control improvment. Incorporating the ampli-

tude constraint of the control input yields

= sat



CAUT

+ η trP

sign(p

¯u

CAUT

+ p

1,k

) ; α



(23)

where u

CAUT

is the cautious action given by

CAUT

= −

+ ˆp

0,k

ˆr

0,k

− ˆr

0,k

+ ˆr

0,k

, (24)

¯u

CAUT

= u

CAUT

∑

∗

i=1

∗

k−i

, p

, . . . , s

, r

, . . . , r

)

, m

0,k

= (y

, . . . , y

k−ns

k−1

, . . . , u

k−nr

)

, and η ≥ 0 is the parameter re-

sponsible for probing. In this case the following

partitioning is introduced for parameter covariance

matrix P

(25)

corresponding to the partition of parameter vector p

p = (−d

, p

)

(26)

where

= (−d

, . . . , d

, − f

, . . . , − f

, r

, . . . , r

, s

, . . . , s

)

(27)

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

132

and

= ( ¯u

, m

1,k

)

(28)

with m

1,k

= ( ¯u

k−1

, . . . , ¯u

k−nd

, ¯y

, . . . , ¯y

k−n f+1

, u

k−l+1

. . . , u

k−l−nr+2

, y

k−l+2

, . . . , y

k−l+ns+2

)

. The ﬁltered

output and input signals are obtained as ¯y

∗

−1

, ¯u

= A

∗

−1

The corresponding diophantine equation and Be-

zout identity are

A(q

−1

)[r

−1

R(q

−1

)]+q

−1

B(q

−1

)S(q

−1

)=r

∗

−1

(29)

A(q

−1

)D(q

−1

) + B(q

−1

)F(q

−1

) = r

−l+2

, (30)

where the polynomial degrees are: nr = na − 1, ns =

na−κ−1, l = na+nb, nd = nb− 2, nf = na−1, and

κ is the number of possible integrators in the system.

It can be shown that the ﬁltered output ¯y

can be

represented in the following regressor form

¯y

= p

k−1

+ v

(31)

For estimation of parameters p

(note that parameters

are included into p

) the Kalman ﬁlter algorithm

(13)-(16) can again be used where

should be re-

placed by ˆp

, s

should be replaced by m

, ε

k+1

should

be calculated as ε

k+1

= ¯y

k+1

−m

ˆp

, and the variance

should be replaced by the variance σ

which can

be evaluated from (29), (30), (1).

4 SIMULATION TESTS

Performance of the described control methods is illus-

trated through the example of a second-order system

with the following true values: a

= −1.8, a

= 0.9,

= 1.0, b

= 0.5, where the Kalman ﬁlter algorithm

(13)-(16) was applied for estimation. The initial pa-

rameter estimates were taken half their true values

with P

= 10I. The reference signal was a square

wave ±3, and then the minimal value of constraint

α ensuring the tracking is α

min

= 3

|A(1)|

|B(1)|

= 0.2. Fig.

1 shows the reference, output and input signals dur-

ing tracking process under the constraint α = 1 for all

control policies.

For the control policy Π

MIDC

the constant learn-

ing weight was λ

= λ = 0.98. The policy Π

was

simulated for third order polynomial A

∗

−1

) having

poles at 0.2± i0.1, −0.1, and for the probing weight

η = 0.2. The control policy Π

can easily be ob-

tained from MIDC by taking p

= 0, p

∗

= 0.

Next, the simulated performance index

J =

N−1

∑

k=0

k+1

− r

k+1

)

was considered. The plots of

J versus the constraint

α are shown in Figs. 2, 3 for σ

= 0.05, 0.1, re-

spectively, and N = 1000. The control u

TSDSC,qp

was

obtained solving the minimization of quadratic form

(20) using MATLAB function quadprog. The perfor-

mance of this control is not included in plots of Figs.

5, 6, because it performs surprisingly essentially in-

ferior with respect to u

TSDSC,co

. In the latter case,

a short-term behaviour phenomenon (G.P. Chen and

Hope, 1993) can be observed in Figs. 2, 3. This

means that when the cutoff method is used then the

range of constraint α can be found where for increas-

ing α the performance index is also increasing.

5 CONCLUSIONS

This paper presents various approaches toward a sub-

optimal solution to the discrete-time dual control

problem under the amplitude-constrained control sig-

nal. A simulation example of second-order system is

given and the performance of the presented control

policies is compared by means of the simulated per-

formance index.

The MIDC method seems to be a good suboptimal

dual control approach, however it has been found that

the MIDC control is quite sensitive to the value of the

learning weight λ. In (Kr

olikowski, 2000) it has been

found that this method often performs very close to

the IPS algorithm (Bayard, 1991).

Performance of all control policies except

TSDSC,co

is comparable, however the differences be-

tween all methods are less noticeable when the con-

straint α gets tight, i.e. when α → α

min

. In all con-

sidered control policies except u

TSDSC,co

, the perfor-

mance index increases when the input amplitude con-

straint gets more tight. This means that for u

TSDSC,co

the effect of the short term behaviour phenomenon

discussed in (G.P. Chen and Hope, 1993) could ap-

pear.

REFERENCES

Astr

om, H. and Wittenmark, B. (1989). Adaptive Control.

Addison-Wesley.

Bayard, D. (1991). A forward method for optimal stochastic

nonlinear and adaptive control. IEEE Trans. Automat.

Contr., 9:1046–1053.

Bayard, D. and Eslami, M. (1985). Implicit dual control for

general stochastic systems. Opt. Contr. Appl.& Meth-

ods, 6:265–279.

Filatov, N. and Unbehauen, H. (2004). Dual Control.

Springer.

DUAL CONTROLLERS FOR DISCRETE-TIME STOCHASTIC AMPLITUDE-CONSTRAINED SYSTEMS

133

G.P. Chen, O. M. and Hope, G. (1993). Control limits

consideration in discrete control system design. IEE

Proc.-D, 140(6):413–422.

olikowski, A. (2000). Suboptimal lqg discrete-time con-

trol with amplitude-constrained input: dual versus

non-dual approach. European J. Control, 6:68–76.

Maitelliand, A. and Yoneyama, T. (1994). A two-stage dual

suboptimal controller for stochastic systems using ap-

proximate moments. Automatica, 30:1949–1954.

N.M. Filatov, H. U. and Keuchel, U. (1993). Dual pole-

placement controller with direct adaptation. Automat-

ica, 33(1):113–117.

R. Milito, C.S. Padilla, R. P. and Cadorin, D. (1982). An

innovations approach to dual control. IEEE Trans. Au-

tomat. Contr., 1:132–137.

Sebald, A. (1979). Toward a computationally efﬁcient opti-

mal solution to the lqg discrete-time dual control prob-

lem. IEEE Trans. Automat. Contr., 4:535–540.

Sworder, D. (1966). Optimal Adaptive Control Systems.

Academic Press, New York.

0 20 40 60 80 100 120

−10

MIDC

0 20 40 60 80 100 120

−10

0 20 40 60 80 100 120

−10

TSDSC, co

0 20 40 60 80 100 120

−10

TSDSC − opt.

0 20 40 60 80 100 120

−10

Figure 1: Reference, output and control signals for Π

MIDC

, Π

TSDSC

, Π

and α = 1.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Performance indices, α

MIDC

= 0.98, σ

= 0.05, N = 1000

MIDC

TSDSC,co

Figure 2: Plots of performance indices for σ

= 0.05.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Performance indices, α

MIDC

= 0.98, σ

= 0.1, N = 1000

MIDC

TSDSC,co

Figure 3: Plots of performance indices for σ

= 0.1.

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

134